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Abstract. We study the problem of computing the probability that a 
given stochastic context-free grammar (SCFG), G, generates a string in 
a given regular language i(-D) (given by a DFA, D). This basic problem 
has a number of applications in statistical natural language processing, 
and it is also a key necessary step towards quantitative oj-regular model 
checking of stochastic context-free processes (equivalently, 1-exit recur- 
sive Markov chains, or stateless probabilistic pushdown processes). 
We show that the probability that G generates a string in L(_D) can 
be computed to within arbitrary desired precision in polynomial time 
(in the standard Turing model of computation), under a rather mild as- 
sumption about the SCFG, G, and with no extra assumption about D. 
We show that this assumption is satisfied for SCFG's whose rule prob- 
abilities are learned via the well-known inside-outside (EM) algorithm 
for maximum-likelihood estimation (a standard method for constructing 
SCFGs in statistical NLP and biological sequence analysis). Thus, for 
these SCFGs the algorithm always runs in P-time. 



1 Introduction 

Stochastic (or Probabilistic) Context-Free Grammars (SCFG) are context-free 
grammars where the rules (productions) have associated probabilities. They are 
a central stochastic model, widely used in natural language processing |14|, with 
apphcations also in biology (e.g. [HIT^]). A SCFG G generates a language L{G) 
(like an ordinary CFG) and assigns a probability to every string in the language. 
SCFGs have been extensively studied since the 1970's. A number of important 
problems on SCFGs can be viewed as instances of the following regular pattern 
matching problem for different regular languages: 

Given a SCFG G and a regular language L, given e.g., by a deterministic 
finite automaton (DFA) D, compute the probability Pg(L) that G generates a 
string in L, i.e. compute the sum of the probabilities of all the strings in L. 

A simple example is when L = U* , the set of all strings over the terminal 
alphabet S of the SCFG G. Then this problem simply asks to compute the 
probability ¥c{L{G)) of the language L{G) generated by the grammar G. Al- 
ternatively, if we view the SCFG as a stochastic process that starts from the 



start nonterminal, repeatedly applies the probabilistic rules to replace (say, left- 
most) nonterminals, and terminates when a string of terminals is reached, then 
W'a{L{G)) is simply the probability that this process terminates. Another simple 
example is when L is a singleton, L — {w}, for some string w; in this case the 
problem corresponds to the basic parsing question of computing the probability 
that a given string w is generated by the SCFG G. Another basic well-studied 
problem is the computation of prefix probabilities: given a SCFG G and a string 
ly, compute the probability that G generates a string with prefix w [111 121) . 
This is useful in online processing in speech recognition [IT] and corresponds to 
the case L = wS*. A more complex problem is the computation of infix prob- 
abilities [TJ I18| , where we wish to compute the probability that G generates a 
string that contains a given string w as a substring, which corresponds to the 
language L — U*wS*. In general, even when rule probabilities of the SCFG G 
are rational, the probabilities we wish to compute can be irrational. Thus the 
typical aim for "computing" them is to approximate them to desired precision. 

Stochastic context-free grammars are closely related to 1-exit recursive Markov 
chains (1-RMC) [5], and to stateless probabilistic pushdown automata (also called 
pBPA) [5] ; these are two equivalent models for a subclass of probabilistic pro- 
grams with recursive procedures. The above regular pattern matching problem 
for SCFGs is equivalent to the problem of computing the probability that a 
computation of a given 1-RMC (or pBPA) terminates and satisfies a given reg- 
ular property. In other words, it corresponds to the quantitative model checking 
problem for 1-RMCs with respect to regular finite string properties. 

We first review some prior related work, and then describe our results. 

Previous Work. As mentioned above, there has been, on the one hand, sub- 
stantial work in the NLP literature on different cases of the problem for various 
regular languages L, and on the other hand, there has been work in the verifi- 
cation and algorithms literature on the analysis and model checking of recursive 
Markov chains and probabilistic pushdown automata. Nevertheless, even the 
simple special case of L = E* , the question of whether it is possible to compute 
(approximately) in polynomial time the desired probability for a given SCFG 
G (i.e. the probability Pg(L(G)) of L{G)) was open until very recently. In [7] 
we showed that q{L{G)) can be computed to arbitrary precision in polynomial 
time in the size of the input SCFG G and the number of bits of precision. From 
a SCFG G, one can construct a multivariate system of equations x = Pg{x), 
where x is a vector of variables and Pq is a vector of polynomials with positive 
coefficients which sum to (at most) 1. Such a system is called a probabilistic poly- 
nomial system (PPS), and it always has a non- negative solution that is smallest 
in every coordinate, called the least fixed point (LFP). A particular coordinate 
of the LFP of the system x — Pg{x) is the desired probability fG{L{G)). To 
compute PG(i(G)), we used a variant of Newton's method on x = Pg{x), with 
suitable rounding after each step to control the bit-size of numbers, and showed 
that it converges in P-time to the LFP [7]. Building on this, we also showed that 
the probability f'G{{w}) of string w under SCFG G can also be computed to 
any precision in P-time in the size of G, w and the number of bits of precision. 



The use of Newton's method was proposed originahy in [8J for computing 
termination probabihties for (multi-exit) RMC's, which requires the solution of 
equations from a more general class of polynomial systems x — P{x), called 
monotone polynomial systems (MPS), where the polynomials of P have positive 
coefficients, but their sum is not restricted to < 1. An arbitrary MPS may not 
have any non-negative solution, but if it does then it has a LFP, and a version 
of Newton provably converges to the LFP . There are now implementations of 
variants of Newton's method in several tools [TF| and experiments show that 
they perform well on many instances. The rate of convergence of Newton for 
general MPSs was studied in detail in [4J , and was further studied most recently 
in [20] (see below). In certain cases, Newton converges fast, but in general there 
are exponential bad examples. Furthermore, there are negative results indicating 
it is very unlikely that any non-trivial approximation of termination probabilities 
of multi-exit RMCs, and the LFP of MPSs, can be done in P-time (see [5]). 

The model checking problem for RMCs (equivalently pPDAs) and w-regular 
properties was studied in [SJ |3] . This is of course a more general problem than 
the problem for SCFGs (which correspond to 1-RMCs) and regular languages 
(the finite string case of w-regular languages). It was shown in [5] that in the 
case of 1-RMCs, the qualitative problem of determining whether the probability 
that a run satisfies the property is or 1 can be solved in P-time in the size of 
the 1-RMC, but for the quantitative problem of approximating the probability, 
the algorithm runs in PSPACE, and no better complexity bound was known. 

The particular cases of computing prefix and infix probabilities for a SCFG 
have been studied in the NLP literature, but no polynomial time algorithm for 
general SCFGs is known. Jelinek and Lafferty gave an algorithm for grammars 
in Chomsky Normal Form (CNF) [11]. Note that a general SCFG G may not 
have any equivalent CNF grammar with rational rule probabilities, thus one can 
only hope for an "approximately equivalent" CNF grammar; constructing such 
a grammar in the case of stochastic grammars G is non-trivial, at least as dif- 
ficult as computing the probability of L{G), and the first P-time algorithm was 
given in [7]. Another algorithm for prefix probabilities by Stolcke [21] applies 
to general SCFGs, but in the presence of unary and e-rules, the algorithm does 
not run in polynomial time. The problem of computing infix probabilities was 
studied in [3 [TCI [H] , and in particular [THl [TB] cast it in the general regular lan- 
guage framework, and studied the general problem of computing the probability 
VciLiD)) of the language L{D) of a DFA D under a SCFG G. From G and 
D they construct a product weighted context-free grammar (WCFG) G': a CFG 
with (positive) weights on the rules, which may not be probabilities, in partic- 
ular the weights on the rules of a nonterminal may sum to more than 1. The 
desired probability Fg{L{D)) is the weight of L{G'). As in the case of SCFGs, 
this weight is given by the LFP of a monotone system of equations y = Pdy), 
however, unlike the case of SCFGs the system now is not a probabilistic system 
(thus our result of [^ does not apply). Nederhof and Satta then solve the system 
using the decomposed Newton method from [8] and Broyden's (quasi-Newton) 
method, and present experimental results for infix probability computations. 



Most recently, in (50] , we have obtained worst-case upper bounds on (rounded 
and exact) Newton's method applied to arbitrary MPSs, x — P{x), as a function 
of the input encoding size \P\ and log(l/e), to converge to within additive error 
e > of the LFP solution q* . However, our bounds in [20J, even when < q* < 
1, are exponential in the depth of (not necessarily critical) strongly connected 
components oi x = P{x), and furthermore they also depend linearly on log(^J— ), 

where q^^^^ = min^ q* , which can be w ^^;iW- describe next, we do far better 

in this paper for the MPSs that arise from the "product" of a SCFG and a DFA. 

Our Results. We study the general problem of computing the probability 
Vc{L{D)) that a given SCFG G generates a string in the language L{D) of 
a given DFA D. We show that, under a certain mild assumption on G, this 
probability can be computed to any desired precision in time polynomial in the 
encoding sizes of G & D and the number of bits of precision. 

We now sketch briefly the approach and state the assumption on G. First we 
construct from G and D the product weighted CFG G' — G (E) D a,s in |16] and 
construct the corresponding MPS y = Pc (y) , whose LFP contains the desired 
probability ¥g{L(D)) as one of its components. The system is monotone but not 
probabilistic. We eliminate (in P-time) those variables that have value in the 
LFP, and apply Newton, with suitable rounding in every step. The heart of the 
analysis shows there is a tight algebraic correspondence between the behavior of 
Newton's method on this MPS and its behavior on the probabilistic polynomial 
system (PPS) x — Pg{x) of G. In particular, this correspondence shows that, 
with exact arithmetic, the two computations converge at the same rate. By 
exploiting this, and by extending recent results we established for PPSs, we 
obtain the conditional polynomial upper bound. Specifically, call a PPS x — P{x) 
critical if the spectral radius of the Jacobian of P{x), evaluated at the LFP q* 
is equal to 1 (it is always < 1). We can form a dependency graph between the 
variables of a PPS, and decompose the variables and the system into strongly 
connected components (SCCs); an SCC is called critical if the induced subsystem 
on that SCC is critical. The critical depth of a PPS is the maximum number of 
critical SCCs on any path of the DAG of SCCs (i.e. the max nesting depth of 
critical SCCs). We show that if the PPS of the given SCFG G has bounded (or 
even logarithmic) critical depth, then we can compute Pg{L{D)) (for any DFA 
D) in polynomial time in the size of G, D and the number of bits of precision. 

Furthermore, we show this condition is satisfied by a broad class of SCFGs 
used in applications. Specifically, a standard way the probabilities of rules of a 
SCFG are set is by using the EM (inside-outside) algorithm. We show that the 
SCFGs constructed in this way are guaranteed to be noncritical (i.e., have critical 
depth 0). So for these SCFGs, and any DFA, the algorithm runs in P-time. 

The paper is organized as follows. Section 2 gives definitions and background. 
Section 3 establishes tight algebraic connections between the behavior of Newton 
on the PPS of the SCFG, and on the MPS of the product WCFG. Section 4 
proves the claimed bounds on rounded Newton's method. Section 5 shows the 
noncriticality of SCFGs obtained by the EM method. Proofs are in the Appendix. 



2 Definitions and Background 



A weighted context-free grammar (WCFG), G = (V, S,R,p), has a finite set V 
of nonterminals, a finite set S of terminals (alphabet symbols), and a finite list 
of rules, R C V X (y U S)*, where each rule r £ R is a pair (^,7), which we 
usually denote by A -> 7, where A eV and 7 € U S)*. Finally p : i? M+ 
maps each rule r e .R to a positive weight, p{r) > 0. We often denote a rule 

r = — )• 7) together with its weight by writing A 7. We will sometimes 
also specify a specific non-terminal S €V &s the starting symbol. 

Note that we allow 7 G (V^ U E)* to possibly be the empty string, denoted 
by e. A rule of the form A— >e is called an e-rule. For a rule r — (A — > 7), we 
let left(r) := A and right(r) := 7. We let Ra = {r & R \ left(r) = A}. 
For A € V, let p{A) = J2r€RAP('^)- ^ WCFG, G, is called a stochastic or 
probabilistic context-free grammar (SCFG or PCFG; we shall use SCFG), if for 
VA e V, p{A) < 1. An SCFG is called proper if WA G V, p{A) = 1. 

We will say that an WCFG, G = {V,E,R,p) is in Simple Normal Form 
(SNF) if every nonterminal A gV belongs to one of the following three types: 

1. type L: every rule r € Ra, has the form A > B. 

2. type Q: there is a single rule in Ra- A BC, for some B,C gV. 

3. type T: there is a single rule in Ra'- either A ^ e, ov A ^ a some a € E- 

For a WCFG, G, strings a,/3 e {VU E)*, and w = ri - . -rk e R*, we write 
a =^ P if the leftmost derivation starting from a, and applying the sequence tt 
of rules, derives (3. We let p{a => j3) = Y\^^iP{rk) ii a => fi, and p{a =5> /3) = 
otherwise. \{A^ wiov A&V and w e we say that tt is a complete derivation 
from A and its yield is 2/(7r) ~ w. There is a natural one-to-one correspondence 
between the complete derivations of w starting at A and the parse trees of w 
rooted at A, and this correspondence preserves weights. 

For a WCFG, G = {V, E ,R,p), nonterminal A £ V, and terminal string 
w e E\ we let p^^'" = E{.|y(^)=^,} ^(^ ^ w)- For a general WCFG, p^'"' need 
not be a finite value (it may be -l-oo, since the sum may not converge). Note 
however that if G is an SCFG, then p^'"' defines the probability that, starting 
at nonterminal A, G generates w, and thus it is clearly finite. 

The termination probability {termination weight) of an SCFG (WCFG), G, 
starting at nonterminal A, denoted g^, is defined by = J2weE' Pa'^- Again, 
for an arbitrary WCFG q'^ need not be a finite number. A WCFG G is called 
convergent if q^ is finite for all A G V. We will only encounter convergent 
WCFGs in this paper, so when we say WCFG we mean convergent WCFG, 
unless otherwise specified. In G is an SCFG, then q'^ is just the total probability 
with which the derivation process starting at A eventually generates a finite 
string and (thus) stops, so SCFGs are clearly convergent. 

An SCFG, G, is called consistent starting at A if q'^ — 1, and G is called 
consistent if it is consistent starting at every nonterminal. Note that even if a 
SCFG, G, is proper this does not necessarily imply that G is consistent. For an 



SCFG, G, we can decide whether q'^ = 1 in P-time (tSj). The same decision 
problem is PosSLP-hard for convergent WCFGs ([S])- 

For any WCFG, G = {V,S,R,p), with n — \V\, assume the nonterminals 
in V are indexed as Ai, . . . , An- We define the following monotone polyno- 
mial system of equations (MPS) associated with G, denoted x = Pg{x). 
Here x — (xi, . . . , a;„) denotes an n- vector of variables. Likewise Pg{x) = 
{Pg{x)i, . . . , PGix)n) denotes an n- vector of multivariate polynomials over the 
variables x = (xi, . . . , x„). For a vector k = K2, ■ • ■ , Kn) G N", we use the 
notation x"^ to denote the monomial ^" . For a non-terminal Ai e V, 

and a string a G {V U S)* , let Ki{a) G N denote the number of occurrences of 
Ai in the string a. We define K{a) G N" to be K{a) — (Ki(a), K2(a), • ■ • , Kn{a)). 

In the MPS x = Pg{x), corresponding to each nonterminal Ai e V, there 
will be one variable Xi and one equation, namely Xi = PG{x)i, where: PG{x)i = 
'^r={A~^a)eRA p{''')x'^^°'^ ■ If there are no rulcs associatcd with , i.e., if i?^; = 0, 
then by default we define PG{x)i = 0. Note that if r G iJ^i is a terminal rule, 
i.e., K(r) = (0, . . . , 0), then p{r) is one of the constant terms of PG{x)i. 

Note: Throughout this paper, for any n-vector z, whose i 'th coordinate Zi 
"corresponds" to nonterminal Ai, we often find it convenient to use ZAi to refer 
to Zi. So, e.g., we alternatively use XAi and PG{x)Ai, instead of Xi and PG{x)i. 

Note that if G is a SCFG, then in x ~ Pg{x), by definition, the sum of the 
monomial coefficients and constant terms of each polynomial PG{x)i is at most 
1, because X^reflA ^^^^ — ^ every Ai e V . An MPS that satisfies this extra 
condition is called a probabilistic polynomial system of equations (PPS). 

Consider any MPS, x — P{x), with n variables, x — {xi, . . . Let R>o 

denote the non-negative real numbers. Then P{x) defines a monotone operator 
on the non- negative orthant M>q. In general, an MPS need not have any real- 
valued solution: consider x = x + I. However, by monotonicity of P{x), if there 
exists a G M>o such that a — -P(a), then there is a least fixed point (LFP) solution 
q* G IR>Q such that q* = Piq*), and such that q* < a for all solutions a G K>g. 

Proposition 1. (cf. [8] or see 117]) For any SCFG (or convergent WCFG), G, 
with n nonterminals Ai, . . . , An, the LFP solution of x — Pg{x) is the n-vector 
q'^ = (q^j, . . . 5 '^f termination probabilities (termination weights) of G. 

For computation purposes, we assume that the input probabilities (weights) 
associated with rules of input SCFGs or WCFGs are positive rationals encoded 
by giving their numerator and denominator in binary. We use \G\ to denote the 
encoding size (i.e., number of bits) of a input WCFG G. 

Given any WCFG (SCFG) G = (V, E,R,p) we can compute in linear time 
an SNF form WCFG (resp. SCFG) G" = {V'S, R',p') of size \G'\ = 0(|G|) with 
V DV such that g^'"' = g^''™ for allAeV,w€ S* (cf. [S] and Proposition 
2.1 of [7]). Thus, for the problems studied in this paper, we may assume wlog 
that a given input WCFG or SCFG is in SNF form. 

A DFA, D = {Q, U, A, sq, F), has states Q, alphabet U, transition function 
A : Q X U Q, start state sq G Q and final states F C Q. We extend A to 
strings: A* : Q x S* Q is defined by induction on the length \w\ > of 



w G U*: for s € Q, Z\*(s, e) s. Inductively, if w = aw', with a G Z', then 
Z\*(s,w) := Z\*(Z\(s,a),w')- We define = {w e \ A*{so,w) G i^}. 

Given a WCFG G and a DFA D over the same terminal alphabet, for any 
nonterminal A of G, we define q^'^ = X]u;gL(z>) 'S'a''"- If G is a SCFG, q^'^ 
simply denotes the probability that G, starting at A, generates a string in L{D). 
Our goal is to compute g^' , given SCFG G and DFA Z?. In general, g^' may be 
an irrational probability, even when all of the rule probabilities of G are rational 
values. So one natural goal is to approximate q^'^ to within desired precision. 
More precisely, the approximation problem is this: given as input an SCFG, G, 
with a specified nonterminal A, a DFA, D, over the same terminal alphabet E, 
and a rational error threshold (5 > 0, output a rational value v E [0, 1] such that 
\v — q^i'^l < 5. We would like to do this as efhciently as possible as a function 
of the input size: |G|, \D\, and log(l/(5). 

To compute q^' , it will be useful to define a WCFG obtained as the prod- 
uct of a SCFG and a DFA. We assume, wlog, that the input SCFG is in 
SNF form. The product (or intersection) of a SCFG G = {V,IJ,R,p) in 
SNF form, and DFA, D = {Q, A, sq, F), is defined to be a new WCFG, 
G ® D — {V',S,R',p'), where the set of nonterminals is V — Q x V x Q. As- 
suming n — \V\ and d = \Q\, then \V'\ — (Pn. The rules R' and rule probabilities 
p' of the product G®D are defined as follows (recall G is assumed to be in SNF): 

— Rules of form L: For every rule of the form (A A _B) e R. and every pair of 
states s,t € Q, there is a rule (sAt) A (sBt) in R'. 

— Rules of form Q: for every rule (A A BC) G R, and for all states s,t,u € Q, 
there is a rule (sAu) A {sBt){tCu) in i?'. 

— Rules of form T: for every rule (A A a) G R, where a € U, and for every 
state s G (5, if A{s, a) = t, then there is a rule (sAt) A a in R' . 

For every rule (A A e) G i?, and every s £ Q, there is a rule {sAs) A e 

Associated with the WCFG, G ® D, is the MPS y = Pa^Diy), where y is now 
a d^n- vector of variables, where n ~ \V\ and d = \Q\. The LFP solution of this 
MPS captures the probabilities q^' in the following sense: 

Proposition 2. (cf. 118], or 19] for a variant of this) For any SCFG, G — 
{V, E, R,p), and DFA, D = (Q, E, A, sq, F), the LFP solution q^®° of the MPS 
X = Pg®d{x), satisfies < q^^^ < 1. Furthermore, for any A £ V and s,t E Q, 
ifsA?) = T.{ro\A'{s.w)=t}(lA''"- ^hus, for every AeV, q^'" = Eteplfs^At)- 

Newton's method (NM). For an MPS (or PPS), x P{x), in n variables, 
let B{x) :— P'{x) denote the Jacobian matrix of P{x). In other words, B{x) 
is an n X n matrix such that B(x)ij = ^^f^. For a vector z G R", assuming 
that matrix (/ — B{z)) is non-singular, we define a single iteration of Newton's 
method (NM) for x — P{x) on z via the following operator: 



M{z) ■.^z + {I-B{z))-\P{z)-z) 



(1) 



Using Newton iteration, starting at rt-vector x^*'-' :— 0, yields the following iter- 
ation: x^^'+i) :=7V(a;W), for fc = 0,1,2,.... 

For every MPS, we can detect in P-time all the variables xj such that — 
[S]. We can then remove these variables and their corresponding equation Xj = 
P{x)j , and substitute their values on the right hand sides of remaining equations. 
This yields a new MPS, with LFP g' > 0, which corresponds to the non-zero 
coordinates of q* . It was shown in [8J that one can always apply a decomposed 
Newton's method to this MPS, to converge monotonically to the LFP solution. 

Proposition 3. (cf. Theorem 6.1 of and Theorem 4-1 of 141) Let x — P{x) 
he a MPS, with LFP q* > 0. Then starting at x^'^^ := 0, the Newton itera- 
tions x'''^^^^ M{x^''^) are well defined and monotonically converge to q* , i.e. 
limfc^oo a:^'') = q* , and ajC^+i^ > x^'') > for all fc > 0. 

Unfortunately, it was shown in [8 that obtaining any non-trivial additive 
approximation to the LFP solution of a general MPS, even one whose LFP is 
< g* < 1, is PosSLP-hard, so we can not compute the termination weights of 
general WCFGs in P-time (nor even in NP), without a major breakthrough in 
the complexity of numerical computation. (See [S] for more information.) 

Fortunately, for the class of PPSs, we can do a lot better. First we can identify 
in P-time also all the variables Xj such that 9* = 1 |H] and remove them from 
the system. We showed recently in [7J that by then applying a suitably rounded 
down variant of Newton's method to the resulting PPS, we can approximate q* 
within additive error 2"-' in time polynomial in the size of the PPS and j. 

3 Balance, Collapse, and Newton's method 

For an SCFG, G = {V,S,R,p), and a DFA, D = {Q,E,A,So,F), we want to 
relate the behavior of Newton's method on the MPS associated with the WCFG, 
G(8)D, to that of the PPS associated with the SCFG G. We shall show that there 
is indeed a tight correspondence, regardless of what the DFA D is. This holds 
even when G itself is a convergent WCFG, and thus x — Pg{x) is an MPS. We 
need an abstract algebraic way to express this correspondence. A key notion will 
be balance, and the collapse operator defined on balanced vectors and matrices. 

Consider the LFP q^ oix^ Pg{x), and LFP q^'^^ ofy = Pc^oiy)- By Pro- 
pos.[T]and[21 for any A E V,q^ = J2weS' '^A^ ^® probability (weight) that G, 
starting at A, generates any finite string. Likewise = Y.{w\A'-(s,w)=t} i'a^ 

is the probability (weight) that, starting at A, G generates a finite string w such 
that Z\*(s, w) = t. Thus, for any A e F and s e Q, = X^teg lfs%- 

It turns out that analogous relationships hold between many other vectors 
associated with G and G ^ D, including between the Newton iterates obtained 
by applying Newton's method to their respective PPS (or MPS) and the prod- 
uct MPS. Furthermore, associated relationships also hold between the Jacobian 
matrices Ba{x) and Bc^Div) of Pg{x) and Pc^oiu), respectively. 

Let n = \V\ and let d = \Q\. A vector y G R'' whose coordinates are 
indexed by triples (sAt) Q Q xV x Q, is called balanced if for any non-terminal 



A, and any pair of states s, s' e Q, Y^teQ VisAt) = Y^teQ Vis' At) - In other words, 
y is balanced if the value of the sum X^tGQ UisAt) is independent of the state s. 
As already observed, q'^'^^ £ Kfj' is balanced. Let *8 C R'^'" denote the set 
of balanced vectors. Let us define the collapse mapping £ : *8 — >■ M". For any 
A gV, €{y)A '■= ^tVisAt)- Note: £(t/) is well-defined, because for y G 03, and 
any A £ the sum y(sAt) is by definition independent of the state s. 

We next extend the definition of balance to matrices. A matrix M ^"^d nxd n 
is called balanced if, for any non-terminals B^C £ V and states s,m G Q, 
and for any pair of states v,v' e Q, J2t ^'^(sBt),{uCv) J2t ^'ksBt),{uCv'), and 
for any s,v & Q and s',v' e Q, Y.t,u^{sBt),(uCv) = Y.t,u^(s' Bt),(uCv')- Let 
sgx g nxd n jjgj^Q^g |;i^g gg|; of balanced matrices. We extend the collapse 
map <t to matrices. £ : 55^^ — !• R"^" is defined as follows. For any M e , and 
any B,C &V, €{M)bc Y.t,u M(sBt),{uCv) - Note, again, G:(M) is well-defined. 

We denote the Newton operator, Af, applied to a vector x' G M" for the 
PPS a: = Pg{x) associated with G by Ncix'). Likewise, we denote the Newton 
operator applied to a vector y' e R'^ for the MPS y — Pa^oiy) associated 
with G ® D hj Mc^Diy')- For a real square matrix M, let p{M) denote the 
spectral radius of M. The main result of this section is the following: 

Theorem 1. Let x — Pg{x) be any PPS (or MPS), with n variables, associated 
with a SCFG (or WCFG) G, and let y — PQ(g)D{y) be the corresponding product 
MPS, for any DFA D, with d states. For any balanced vector y € *B C M.'^ 
with y > 0, p{BG0D{y)) = p{BG{€{y))). Furthermore, if p{BG®D{y)) < 1, 
then MG®D{y) is defined and balanced, A/g'(£(2/)) is defined, and ^.(N'G^Diy)) — 
A/'g(£(2/)). Thus, Mg®d preserves balance, and the collapse map £ "commutes" 
with M over non-negative balanced vectors, irrespective of what the DFA D is. 

We prove this in the appendix via a series of lemmas that reveal many alge- 
braic/analytic properties of balance, collapse, and Newton's method. Key is: 

Lemma 1. Let <B>o = «8 n M^o" and «B>o = » n M^fo"'''''". 
We have q^^^ G *8>o and €{q^^^) = q^ , and: 

(i) Lfye «8>o C Rfo" then BG^oiy) e *B>o, and ^{Bc^^Diy)) = BG{<Z{y)). 

(ii) If ye ^>o, then PG^oiy) S *B>o, ancT tiPG^niy)) = PGi^iy))- 
(Hi) If y e 23>o and p{BG{^{y))) < 1, then I — BGr^oiy) is non-singular, 

(I - BG^oiy))'^ € S^o^ '^nd £((/ - BG^D{y))-^) = {I- BG{€iy)))-\ 
(iv) If ye <8>o and p{BG(<i{,y))) < 1, then NG^oiy) e 25^ 
and iL{MG®D{y)) = MG{^{y))- 

An easy consequence of Thm. [T] (and Prop. is that if we use NM with 
exact arithmetic on the PPS or MPS, x = Pg{x), and on the product MPS, 
y = PG(giDiy), they converge at the same rate: 

Corollary 1. For any PPS or MPS, x = Pg{x), with LFP q^ > 0, and cor- 
responding product MPS, y = PG^oiy), if we use Newton's method with exact 
arithmetic, starting at a;^"-' := 0, and y^^^^ := 0, then all the Newton iterates a:'-'^^ 
and y^^'^ are well-defined, and for all k: x^^^ = £(y'-'"'-'). 



4 Rounded Newton on PPSs and product MPSs 

To work in the Turing model of computation (as opposed to the unit-cost RAM 
model) we have to consider rounding between iterations of NM, as in [7]. 

Definition 1. (Tlounded-down Newton's method (Tl-NM^, with parame- 
ter h.) Given an MPS, x = P{x), with LFP q* , where q* > 0, in R-NM with 
integer rounding parameter h > 0, we compute a sequence of iteration vectors 
x^^^ . Starting with x^^^ := 0, Vfc > we compute x^'^^^^ as follows: 

1. Compute x^^^^^ := Mp{x^^^), where Mp{x) is the Newton op. defined in 

2. For each coordinate i = 1, . . . ,n, set x[''^^^ to be equal to the maximum mul- 
tiple o/2~'' which is < max(x^''^^\ 0) . (In other words, round down x^^^^^ 
to the nearest multiple of , while ensuring the result is non-negative.) 

Unfortunately, rounding can cause iterates x^^^ to become unbalanced. Neverthe- 
less, we can handle this. For any PPS, x = P{x), with Jacobian matrix B{x), and 
LFP q*, p{B{q*)) < 1 ([HIIZ|). If p(-B(g*)) < 1, we cah the PPS non-critical. 
Otherwise, if p{B{q*)) = 1, we cah the PPS critical. For SCFGs whose PPS 
x = Pg{x) is non-critical, we get good bounds, even though R-NM iterates can 
become unbalanced: 

Theorem 2. For any e > 0, and for an SCFG, G, if the PPS x = Pg{x) has 
LFP < q*^ < 1 md p{BG{q^)) < 1, then if we use R-NM with parameter 
h + 2 to approximate the LFP solution of the MPS y — Pc^Diy), then \\q'^'^^ — 
2/[''+^l||oo < e where h := U\G\+3+ [log(l/e) + logd] . 

Thus we can compute the probability q'^'^ = X^teF^s^A? within additive 
error S > in time polynomial in the input size: \G\, \D\ and log(l/(5), in the 
standard Turing model of computation. 

We in fact obtain a much more general result. For any SCFG, G, and corre- 
sponding PPS, X = Pcix), with LFP q* > 0, the dependency graph, Hq — (V, E), 
has the variables (or the nonterminals of G) as nodes and has the following edges: 
(xi,Xj) G £^ iff Xj appears in some monomial in PG{x)i with a positive coeffi- 
cient. We can decompose the dependency graph Hq into its SCCs, and form the 
DAG of SCCs, Hq. For each SCC, S, suppose its corresponding equations are 
xs = F'g{xStXjj(^s))s: where D{S) is the set of variables Xj ^ S such that there 
is a path in Hq from some variable Xi € S to Xj. We call a SCC, S, of Hq, a 
critical SCC if the PPS xg = Pg{xsi'i'o(s)^<s critical. In other words, the 
SCC S is critical if we plug in the LFP values q'^ into variables that are in lower 
SCCs, D{S), then the resulting PPS is critical. We note that an arbitrary PPS, 
X = Pq{x) is non-critical if and only if it has no critical SCC. We define the 
critical depth, c(G), of a; = Pg{x) as follows: it is the maximum length, k, of 
any sequence Si,S2,. . ■ , Sk, of SCCs of Hq, such that for alH € {1, . . . , fc — 1}, 
Si+i C D{Si), and furthermore, such that for all j G {1, . . . , fc}, Sj is critical. 
Let us call a critical SCC, S, of Hq a bottom- critical SCC, if D{S) does not 
contain any critical SCCs. By using earlier results (iSj i3j) we can compute in 
P-time the critical SCCs of a PPS, and its critical depth (see the appendix). 



PPSs with nested critical SCCs are hard to analyze directly. It turns out we 
can circumvent this by "tweaking" the probabilities in the SCFG G to obtain an 
SCFG G' with no critical SCCs, and showing that the "tweaks" are small enough 
so that they do not change the probabilities of interest by much. Concretely: 

Theorem 3. For any e > 0, and for any SCFG, G, in SNF form, with > 0, 
with critical depth c{G), consider the new SCFG, G' , obtained from G by the 
following process: for each bottom- critical SCC, S, of x = Pg{x), find any rule 
r = A ^ B of G, such that A and B are both in S ( since G is in SNF, such a 
rule must exist in every critical SCC). Reduce the probability p, by setting it to 
p' = p{l - 2-(i4|G|+3)2'('5)g2'('5)) this for all bottom- critical SCCs. This 
defines G' , which is non-critical. Using G' instead of G, if we apply R-NM, with 
parameter /i + 2 to approximate the LFP ®^ of MPS y — Pc^Diu), then 
||^G®D „ x[''+il||oo < e where h := riogd+ (3 • 2'^^^ + l)(log(l/e) + U\G\ + 3)1 . 
Thus we can compute q^'^ — X^teF ^s^^f within additive error 6 > in time 
polynomial in: \G\, \D\, log(l/i5), and 2''^'^\ in the Turing model of computation. 

The proof is very involved, and is in the appendix. There, we also give a family 
of SCFGs, and a 3-state DFA that checks the infix probability of string aa, and 
we explain why these examples indicate it will likely be difficult to overcome the 
exponential dependence on the critical-depth c{G) in the above bounds. 

5 Non-criticality of SCFGs obtained by EM 

In doing parameter estimation for SCFGs, in either the supervised or unsuper- 
vised (EM) settings (see, e.g., |17j), we are given a CFG, H, with start nonter- 
minal S, and we wish to extend it to an SCFG, G, by giving probabilities to the 
rules of H. We also have some probability distribution, 'P(7r), over the complete 
derivations, tt, of H that start at start non-terminal S. (In the unsupervised 
case, we begin with an SCFG, and the distribution V arises from the prior rule 
probabilities, and from the training corpus of strings.) We then assign each rule 
of H a (new) probability as follows to obtain (or update) G: 

where G(r, tt) is the number of times the rule r is used in the complete derivation 
77, and G{A, tt) = TlreRA Equation ^ only makes sense when the sums 

V{tt)G{A, tt) are finite and nonzero, which we assume; we also assume every 
non-terminal and rule of "H appears in some complete derivation tt with V{tt) > 0. 

Proposition 4. // we use parameter estimation to obtain SCFG G using equa- 
tion under the stated assumptions, then G is consistent, i.e. q'^ — 1, and 
furthermore the PPS x = Pg{x) is non-critical, i.e., p{Bg{1)) < 1. 

It follows from Prop. |3]and Thm. [5J that for SCFGs obtained by parameter 

G D 

estimation and EM, we can compute the probability g^' of generating a string 
in L{D) to within any desired precision in P-time, for any DFA D. 



^ Consistency of the obtained SCFGs is well-known; see, e.g., [151 117) & references 
therein; also [19] has results related to Prop. |4] for restricted grammars. 
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A Proof of Theorem [T] (and of Lemma [T]). 



Theorem [Tl Let x = Pg{x) be any PPS (or MPS), with n variables, as- 
sociated with a SCFG (or WCFG) G, and let y — Pc^oiy) be the corre- 
sponding product MPS, for any DFA D , with d states. For any balanced vec- 
tor y € Q M''"", with y > 0, p(Bg®d{v)) = /o(Bg(£(j/)))- Furthermore, if 
p{BQfg,£){y)) < 1, then Mctsioiy) is defined and balanced, Afci^iy)) defined, 
and £(A/'g®d(2/)) — ^ci^iy))- Thus, A/g^d preserves balance, and the collapse 
map € "commutes" with f\f over non-negative balanced vectors, irrespective of 
what the DFA D is. 



We establish this via a series of lemmas that reveal many algebraic and ana- 
lytic properties of balance, collapse, and their interplay with Newton's method. 
Lemma [2] first establishes a series of algebraic and analytic properties of ar- 
bitrary balanced vectors and matrices. Lemma [T] then uses these to establish 
properties of the specific balanced matrices and vectors arising during iterations 
of Newton's method on PPSs (and MPSs), and on corresponding product MPSs. 
Theorem[T]is an immediate consequence of Lemma[Tl parts below. 

Lemma 2. Consider the set *8 C R*^ " of balanced vectors, and the set C 
^d'nxd^-n balanced matrices. Let <8>o = <B n R^fg" and <B n Mfo"'''''". 

(i) *B and are both closed under linear combinations. In other words: 
Y,^a^v'^'^ G » andY,^a^M'^'^ e S"", if\/i, £ «8 and M^''> G «8^. 
Furthermore, ^ is a linear map on both 58 and . In other words: 

whenever, V«, a, G R, G <B, and M<*> G . 
(a) IfM G *8'' andv then Mw G <B and €{Mv) = €{M)€{v). 

(iii) IfM,M' G S"", then MM' G and €{MM') = €{M)it{M'). 

(iv) IfM G <B>o, andv G M'''" is any vector, then €{Mv) > €{M)€{v), where we 
extend the map £ to arbitrary v' G R'' " by letting := min^ '^[sAt)' 

(v) If M G S>Q, then p{M) — p{C{M)). In other words, the collapse operator £ 
preserves the spectral radius of balanced non-negative matrices. 

(vi) Ifv G »>o, then < Mv)\\oo- IfM G then \\M\\^ < d\\€{M)\\^. 

Proof. 

(i): This can be verified directly from the definitions of balance and collapse. In 
particular, for any nonterminal A (z V, and any states s, s' G Q: 

t i it 

~ )^ (because every v^*^ is balanced) 

i 



Also, we have €{Y,iaiV^^)A ■= Y.t(T.i<^i^^^){sAt) = Y.i'^^<L{v^^)A. 
Likewise, for any nonterminals B,C gV, and any states s,u G Q and v, v' G Q: 



t i it 

= ^E ^{sBt) {uCv') (because every M^'^ is balanced) 

i t 

t i 

Similarly, for any nonterminals B, C , and any states s, v, s' ,v' € Q: 

t,u i i t,u 

= ^ ai ^ ^'^'is'Bt) {uCv') (because every M^*^ is balanced) 

i t,u 
t,u i 

Now, €{J:, a,M(^)B,c := Et,„(Ei «iM«)(,Bt),(„c.) = Et,„ ^J^.C^C.) = 

(ii): For any non-terminal and state s: 

t t,u,C,z 

= '^(^M(sBt),{uCz))'"uCz 

u,C,z t 



= ^isBt),{uCz)) E ^"Cz (since M is balanced) 

C,u t z 

= EE ^{sBt),(uCz))^{v)c (since w is balanced) 

C,u t 

= Y.^Y.^i^BtUuCz))^{v)c 



C t,u 



= '^(L{M)b,c^{v)c (since M is balanced) 



c 

= {€{M)(i{v))B 



which is independent of s. So €{Mv)b = Et(-^^)(s-Bt) = (£(M)(2:(u))s. 



(iii): For any non-terminal D, E, and states s,w,x € Q: 

M')(sDt),(wEx) = XI ^(^Dt),(uCv)M[^Cv),{wEx) 

t t,U,C,V 

= ^{sDt),{uCv))^(uCv)AwEx) 

u,C,v t 

= XI (XI M(^sDt),{uCv)) X] ^luCv),{wEx) (since M is balanced) 

C,u t V 

Since M' e *B^, the last sum is independent of x, which is what we aimed to 
show. Next consider: 

X](AfM')(s£)t),(«,Bx) = X] ^{sDt),iy'Cv)M[^Cv),{wEx) 

t,w t,w,u,C,v 

= X] C^^{sDt),{uCv))M[^Cv),{wEx) 

U,W,C,V t 

= X (X ^{sDt),{uCv))J2 ^uCv),(wEx) (since M is balanced) 

C,u,w t V 

= X(X] ^{sDt),{uCv)) X ^luCv),iwEx) 

C,u t v.w 

= X^(X^ M,^sDt),{uCv))^{M')c,E (since M' is balanced) 

C,u t 

,B (since B is balanced) 

c 

= {<L{M)<L{M'))d^e 

So, Y^t w(^^'){sDt),(wEx) is independent of s, x and €{MM')d,e = J2t wi^^'){sDt),{wEx) 
{(L{M)k{M'))n,E, for any D, £ e F. 

{iv): For any non-terminal _B and state s: 

Xl(-^^)(sSt) = X] -^(sBt),(uCz)i'«Cz 
t t,u,C,z 



= '^C^M(^sBt),{uCz))VuCz 

= X(X ^(sBt),(uCz)) X] (since M is balanced) 

C,u I z 

> X^(X^ M(^,Bt),(uCz)) min^vucz (since (Xi* M(sBt)^(„c2)) > 0) for any C, ' 

C,u t z 

= Y,(L{M)b,c^{v)c = {<L{M)€{v))b 

c 

Since this holds for any B and any s, €{Mv)b = min^ J2ti^'^)isBt) ^ ('^(-M)2^(^'))b- 



(vi): (we will prove part (v) below) Since v G S>o, "^(5^^) < J2t' ^{sAt') — 
so ||w||oo < llCMIloo. ForMe «B^o: 



||Af||oo max ^ M{sBt),{uCv) 

u,C,v 
u,C,v,t 

= maxVe:(M)B,c 

C,v 

= maxd^£(M)B,c 

c 

= d||e:(M)||oo 

(v): By standard facts from Perron-Frobenius theory (see e.g. Theorem 8.3.1 of 
LIO]), the non-negative matrix £(Af), has as an eigenvalue p(£(M)) associated 
with which is a non-negative eigenvector vq ^ 0. That is <t{M)va — p{'^{M))vq 
for some non-zero va > 0. Now consider any non-negative balanced vector u with 
£(m) — Vq. (Such a u obviously exists.) Let f{u) — j^^mJ^^u. By part (ii), 
Mu is balanced and £(Mu) = £(M)ug = p{€.{M))vG- So, /(m) is non-negative 
and balanced and has €{f{u)) = vg- The set of non-negative balanced vector 
u with £(u) = VQ is compact (it is a product of simplices) and the continuous 
function / maps this set into itself. So by Brouwer's fixed point theorem^ f has 
a fixed point, that is a u* with u* = p{c\m)) ■ That is, u* is an eigenvector 
of M with eigenvalue p(Q:(M)). So p{M) > p{C{M)). 

In the other direction, we use the fact (see, e.g.. Theorem 5.6.12 of [TU]) that 
for any square matrix N, limk^oo\\N''\\oo = if and only if p{N) < 1. 

Now for M e *B>Q assume, for contradiction, that p{M) > p{(L{M)). Then 
P^W^)M) = j^pIm) = 1 > ^p((!:(M)) = p{-^il{M)). Thus, by the 
above fact from matrix theory, we have that limfc_j.oo ll( p(M) ^(^^))'°lloo — 0. 

But for any > 1, 



< < d||£((^Af)'=)||oo (by part {vi)) 

= rf||e:(^M)''^|| (by part (***)) 
= d\\{-^^m)noo (bypart(*)) 

And thus, since the right hand side goes to as fc — ^ 00, we must also have 
limfc_>oo ll( p(M) -^^)'"'lloo — 0, but this is a contradiction, because p{j^lj^M) = 1. 
So, our assumption p{M) > p{€.{M)) must be false. 

Having established both directions, we conclude that p{M) — p{€{M)). □ 



Lemma [TJ Let »>o «8 n R^tg" and 55^0 = <B n M^^.q"'"^ 

Let Bq{x) denote the Jacobian of the PPS (or MPS) x — Pg{x), and let 
BG(g)D{y) be the Jacobian of MPS y = PG(g,D{y)- 
Then q^®° e <B>o and €{q^®'^) = , and: 

(i) If ye »>o C « then Bc^Diy) e «B^o, and €{BG^D{y)) = BG{€{y)). 
(it) If ye ^>o, then PG^oiy) & *8>o, and £(^000(1/)) = Pai'^iy))- 
(Hi) If y e 5B>o and p{BG(^{y))) < 1, then I — BGi^D{y) is non-singular, 

{I - BG^My))-^ € S^o, and £((/ - = (/ - BG{€{y)))-\ 

(w) If ye «8>o a«rf p(BG(£(y))) < 1, then NG^oiy) e 

and e:(AAGoD(y)) = AAG(G:(y)). 

Proo/. 

Firstly, let us recall why q'^'^^ e <B>o and £(9'^®-°) = q'^ . Recall these are the 
LFP of a; = ^0(2;), and the LFP q^®^ oi y ^ Pg^dIv)- By Propositions [T] 
andlU for any nonterminal A e V, q^ = X^iiiei;* ^a'^ ^^'^ probability (weight) 
that G generates any finite string w. Likewise q'j^^^ = J2{w\A'{s w)=t} 1a'^ 
probability (weight) that, starting at A, G generates a finite string w such that 
Z\*(s, w) = t. Thus, clearly, for any AeV, and any s e Q, q^ = J2t&Q ifsAt) = 
€{q^'^^)A- Now we prove the enumerated assertions one by one: 

(i): We need to argue both that BG^oiy) G ®>0' ^^^"^ ^^^^ ^{BG^oiy)) = 
BG{^{y)), for y e S>o- Again, recall that we are assuming wlog that G is in 
SNF form. We split the proof into cases depending on the type of non-terminal 
A in BG0D{y)(sAt),{uEv)- Let 6a, denote the Dirac function: 5a, 13 := 1 if a = /3, 
and Sa,p := if a 7^ /3. 

Type Q: For any non-terminal A of type Q, the only rule in has the form 

A \ BC, and Pg{x)a = xbxg- And, for any states s,t e Q, PG®D{y){sAt) = 

Y.w(iQV(sBw)y{wCt)- Thus 

D ^ \ • ^BG®D{y)(sAt) r r , s: s 

i^G®D\y)(sAt),(uEv) = ^ = Ot.v ■ Oe.C ' + ^s.u ' Oe.B ' y{vCt) 

Oy(uEv) 

Thus 

BG®D{y){sAt),(uEv) — 5e,C ■ y{sBu) + 5s,u ' 5e,B ' ^ 2/(i;Ct) 
t t 

Since y is balanced, y(vct) is independent of w, so S^g^j) (^^j,) is indepen- 
dent of w. Next we note that: 

BG®D{y)(sAt),(uEv) = Se,C ^ 2/(sSti) + fe.B ^ J/C-uCt) 

u t 

Thus 

X] BG®D{y)(sAt)XuEv) = fe,c£(y)i3 + '5£;,i3C(y)c = SG(£(y)) 



Type T: For any non-terminal A of type T, Pg{x)a does not depend on x, and 
PG0D{y)sAt does not depend on y, for any s,t E Q. Thus J^t BG(g,D{y)(sAt),(uCv) = 
0, and Y.t,u BG®D{y)(sAt),(uCv) = = BG{^{y))A,c- 

Type L; For any non-terminal A of type L, recall that Pg{x)a = TlreB^A Pr^B^- 
And for any states s,t, PG®D{y)(sAt) = T.reB^AP^y(^B,^t)- 

Thus, all the entries of Bg{x))a,c a-nd BGtS)D{y)(sAt),(uCv) a-re independent 
of x and y, respectively. And 

U { \ dPG®D{y)(sAt) „ „ T3 I \ 

^G^D(y)(sAt),(uCv) — 5 = Os,u • dt^v ' Bg[x)a,C 

Oy(uCv) 

Consequently Y.tPG®D{y){sAt),{uCv) = 5s,iiBg{x)a.,c =, which is independent 
of V. And, Y.t,u BG®D{y){sAt),{uCv) = Bg{x)a,c, which is independent of s and 
and Bg{x)a.c = ^g(C(j/))a,Ci because Bg{x)a,c is independent of x. 

Having shown that for all nonterminals A and C, and all nonterminals 
s, u S Q, the sum BG(g)Diy)(sAt),{uCv) is independent of v. And we have 
also shown that for all nonterminals A and C, the sum ^ BG^D{y)(sAt),(uCv) 
is independent of s and w, and furthermore, that the latter sum (which is by 
definition ^{BG(g)D{y))A,c)i is equal to BG{€.{y)). Thus our proof for part [i) is 
complete. 

(m): Part (m) could be proved using a case- by-case analysis similar to part (z). In- 
stead, we shall use part [i). Recall that Pg{x) and Po^Giy) have no polynomials 
of degree more than 2. Furthermore: 

PGix) = PGiO) + Bg{^x)x 

And ^ 

PG^oiy) = Pg®d(0) + BG®D{-y)y 

By the previous parts of this Lemma, and by Lemma [2l we know that 
BG®D{\y)y is balanced, and €{BG(g,D{^y)y) = BG{\^{y))€{y). All that re- 
mains is to show that Pg®d{^) is balanced and that ^{Pg®d{^)) — Pg{^), and 
again use the properties established in Lemma [5] 

Now, unless a non-terminal A has type T, Pg(0)a = 0, and for any states 
s,t € Q, PG0D{O){sAt) — 0- So, in these cases, there is nothing to prove. If the 

nonterminal A does have type T, then Pg{x)a = 1. If there is a rule A ^ a, 
for some a G U, then for any state s G Q, there is a unique state t' £ Q with 
A{s, a) = t' . If instead there is a rule A ^ e, then let t' := s. In both cases, note 
that Et ^G(»n(2/)(sAt) = 1 = PG{^{y))A, since PG<S)D{y){sAt) = 1 when t = t' 
and PG0D{y){sAt) = otherwise. Thus also ^iPc^oiy)) = PGi^iy)) in all cases. 



(iii): By assumption, p{BG{€{y))) < 1, so by Lemma[2](w), p{BG^D{y)) < 1- It 
is a basic fact that for any square Af > if p{M) < 1 then {I—M) is non-singular 



and (/ - A/)-i = T,Zo^'- (See, e.g., [13], Theorem 15.2.2, page 531). Thus 
I - Bctsoiy) is non-shigular, and (/ - BccsDiy))"^ = Y.iLoiBG(g)D{y)y ■ Note 
that each {Bc^D{y)y , for i > 0, is balanced, by ushig the previous parts of this 
Lemma and Lemma [2] (wi), and thus so are the partial sums X)iLo(^G(8£'(j/))*) 
for any fc > 0. Therefore (/ - BG®u(y))"^ = limt^oo I]*Li(^G(»n(2/G«.i3))' is 
a limit of balanced non-negative matrices. But then (/ — BQ^u{y))~^ must be 
balanced, because the definition of balance for a matrix M requires equalities 
between continuous (in fact, linear) functions of the entries, and thus if all the 
matrices X]i=i(-^G®r>(j/G(»r>))* satisfy these conditions, then so does their limit. 
Furthermore £ is a linear and continuous function on matrices, so £((/ — 

Ba^Diy))-^) = EZi ^{BG^Dm = EZi <^iBG^D{y)y ~ (/-e:(i?G8D(2/)))^^ 
By part (i) of this Lemma, this is equal to (/ — _Bg(£(2/))) ^. Done. 

(iv): By part (ii) of this Lemma, Pc^Diy) is balanced and it{PG®D{y)) — 
Pci^iy))- Part (Hi) of this lemma says that (/ — BG^Diy))"^ is balanced and 
£((/ - BG®D{y))'^) = (I - '^{Ba^Diy)))'^ ■ Now we can apply the various al- 
gebraic properties of balanced vectors and matrices from Lemma [5] to conclude 
that 

A/g«.d(2/) :=?/ + (/- BG®D{yy^{PG®D{y) - y) 

is balanced andih&i <l{NG®D{y)) ^ 'l{y) + {I - BG{<t{y)))-\PG{'^{y))-'^{y)) = 

A/G(e:(2/)). □ 

As mentioned already, Theorem [T] follows immediately from Lemma (TJ parts 
{i)k{iv). 

B Proofs for Section 4 

We will first show how to compute in P-time the critical SCCs and the critical 
depth of a PPS. We then proceed to prove the main theorems of the section: 
Theorems [2] and [3l 

Let X = P{x) be a PPS (wlog in SNF), with LFP q* > 0, let B{x) be its 
Jacobean matrix, and let H — {V, E) be its dependency graph. If B is a square 
matrix and /, J are subsets of indices, we will use Bi j to denote the submatrix 
with rows in / and columns in J, and we use Bi to denote the square submatrix 
Bij. 

Proposition 5. Given a PPS x — P{x) with LFP q* > 0, we can compute in 
polynomial time its critical SCCs and its critical depth. 

Proof. We know that for each SCC S of H, either all the variables (nodes) of the 
sec have value 1 in the LFP q*, or they all have value < 1; moreover, if they 
have value 1, then so do all the variables that they can reach in H, i.e., — 1 
implies = 1 [H|- Furthermore, we can determine which variables and SCCs 

have value 1, and which value < 1, in polynomial time [8] (this was improved to 
strongly polynomial time in [3]). We also know that p{B{q*)) < 1, thus a PPS 



is critical iff p{B{q*)) = 1. Furthermore, by Theorem 3.6 of [7], if q* < 1, then 
p{Biq*)) < 1. 

Therefore, for each SCC S, we can determine whether it is critical as follows. 
If < 1 then S is not critical. If q^ — 1, then S is critical iff p{B{l)s) = 1, 
and it is not critical iff p{B{l)s) < 1; we can determine which of the two is the 
case as follows. Since the spectral radius of ^(l)^ is at most 1, p{B{l)s) = 1 iff 
there is a vector u 7^ such that {B{l)g) ■ u = u (and we can take w > to be 
an eigenvector for the eigenvalue 1 in this case since the matrix is nonnegative), 
or equivalently since the constraints are homogeneous in u, this is the case iff 
the set of linear equations {(^(l)^) ■ u = u; Ui = 1} has a solution. This can 
be checked in (strongly) polynomial time by standard methods. 

Once we have identified the critical SCCs, it is straightforward to compute 
the critical depth in linear time in the size of the DAG of SCCs by a traversal 
of the DAG in topological order. □ 

Proposition 6. A PPS x = P{x) is critical if and only if at least one of its 
SCCs is critical. 

Proof. (Only if): Suppose first that the PPS is critical, i.e., that p{B{q*)) — 1. 
Let > 0, 7^ 0, be an eigenvector of B{q*) for the eigenvalue 1, i.e., B{q*)v = v. 
Let iS be a lowest SCC that contains a variable with nonzero value in u, i.e. 7^ 
and = 0. Then vs = 5(g*)5,<suD(5) ' «sur>(5) = B{q*)s ■ vs- Thus, vs is 

an eigenvector of B{q*)s with eigenvalue 1, hence p{B{q*)s) > 1, and since we 
always have p{B{q*)s) < 1, if follows that 5 is a critical SCC. 

(If): Conversely, suppose that there is a critical SCC, and let 5 be a highest 
critical SCC in the DAG of SCCs. Then p{B{q*)s) = 1. Let w > be an 
eigenvector of B{q*)s with eigenvalue 1. Let E{S) be the (possibly empty) set 
of variables which depend on variables in S but are not themselves in 5. If 
E{S) = then let w be a vector with = u and Vi = for all variables Xi ^ S. 
Then B{q*)v = v, i.e., v is an eigenvector of B{q*) with eigenvalue 1, hence 
p{B{q*)) > 1 and the PPS is critical. 

Suppose that E{S) is nonempty. Then E{S) contains no critical SCCs by 
our choice of S. This implies by our proof above for the (only if) direction 
that the PPS Xe{s) = P{xe(s),xd{e{S))) is not critical, i.e., p{B{q*)E{s)) < 1- 
Thus, (/ — B{q*)E(s))^^ exists. Let v be the vector with = u, Ve{S) — 
{I — B{q*)E{s))^^B{Q*)E{S),s ■ and Vi — for all xi not in either S or E{S). 

We claim that B{q*)v = v. If Xi does not depend on a variable in 5, then 
any Xj which Xi depends on also does not depend on S and so has Vj ~ Q. So 
{B{q*)v)i = — Vi. Next we consider {B{q*)v)s. Since D{S) is disjoint from 
S and E{S), wd(5) 0. So {B{q*)v)s = {B{q*))s ■ vs vg- Lastly consider 

{Biq*)v)EiS)- 

{B{q*)v)E(s) = B{q*)E(s) ■ Ve{S) + B{q*)E{S),S ■ vs 

= Ve{S) - {I ~ B{q*)E{s)) ■ Ve{S) + B{q*)E(s),S ' vs 
= ve{s) ~ B{q*)E{s),s ■ vs + B{q*)E{s),s ' «S 

= VE{S) 



So B{q*)v = V. Therefore, p{B{q*) > 1 and hence the PPS is critical. 



□ 



In the remainder of this section we will prove Theorem [31 and along the way, 
we will also establish Theorem [21 The proof of Theorem [31 is long and involved. 
We first need to recall, and establish, a series of Lemmas and Theorems. 

Lemma 3. (Lemma C.3 of 161) If A is a non-negative matrix, and vector u > 
is such that An < u and Hitjloo < 1, and a, /3 G (0, 1) are constants such that for 
every i £ {1, ...n}, one of the following two conditions holds: 

(I) {Au), < (1 - P)u, 

(II) there is some k, 1 < k < n, and some j , such that {A'')ij > a and {Au)j < 



then {I — A) is non-singular, p{A) < 10 and 

\\iI-A)-'\\^< 



Lemma 4. (Lemma A. 4 of JT^) Let A he a non-singular n x n matrix with 
rational entries. If the product of the denominators of all these entries is m, 
then 

\\A-'\\oo<nm\\Ar^ 

Lemma 5. (Lemma 5.4 from J^; or see Lemma 3.7 from JTj) Let x — P{x) be a 
monotone system of polynomial equations which has a LFP q* . For any positive 
vector d G M"q that satisfies B(q*)d < d, any positive real value A > 0, and 
any nonnegative vector x S K^c ~ ^ — '^d, and (J— B{x))~^ exists and is 
nonnegative, then 

q*-M{x) < 

Theorem 4. (Theorem 3.12 of I7j) For a PPS, x = P{x) in n variables, in SNF 
form, with LFP q* , such that < q* < 1, for all i ^ 1, . . . , n: 1 - q* > 2"''l^l . 
In other words, \\q*\\oo < 1 - 2-^^^^. 

Theorem 5. (Theorem 4-6 of 161) 

(i) if X — P{x) is a PPS with q* < 1 and < y < 1 then 

II (/ - i?(l(y + qn)r'\U < Z^^l^^l max{2(l - y)^^, S'^l} 
(a) if X ~ P{x) is a strongly connected PPS with q* = 1 and < y < 1, then 
||(/-i?(^))-^|U<2^l^l \ 

(1 - yjmin 



* Although the fact that the conditions imply also that p{A) < 1 is not stated explicitly 
in Lemma C.3 of [6], it is indeed established in the proof in [6]. 



Lemma 6. If x ^ P{^) is a strongly connected PPS (in SNF form), with Jaco- 
bian B{x), and if B{^l)v < v for some vector v > 0, then Jiii!!^ < 2^-^^ 

Proof. (This proof is a variant of that of Lemma 3.10 in [7J.) Let I — argmaxi Vi, 
and let k = argminj vj. Since x = P{x) is in SNF form, every non-zero entry 
of the matrix B{^1) is either 1/2 or is a coefficient of some monomial in some 
polynomial P{x)i of P(x). Moreover, ^(jl) is irreducible. CaUing the entries 
of bi,j, we have a sequence of distinct indices, ii, 12, . . . , inn with I = «i, 

k = im, m < n, where each bi i-^^ > 0. (Just take the "shortest positive path" 
from / to k.) For any j: 

.1. 
'2' 

By simple induction: Vk > (YVjl^i^ ^ijij+i)'^!- Note that \P\ includes the en- 
coding size of each positive coefhcient of every polynomial P{x)i. We argued 
before that each bi-i.^^ is either a coefficient of a; = P{x): or is equal to 1/2. 
Furthermore, if we consider the equation Xi^ = P{x)ij, and denote its encoding 

size as \Pi \, then it is easy to see bi-i.^-^ > 2~''^'3'^ because either 6^ ap- 
pears in P{x)i., or else bi-i.^^ = 1/2, but it is always the case that \Pi- \ > 1. 
Now, the jj's are distinct (because we are using a shortest path). Therefore, 
since \P\ = \Pi\^ must have YYj=^ bi-i-^^ > 2~l^l, and thus we have: 

Theorem 6. If x — P{x) is an MPS with n variables, with LFP q* < 1, and 
p{B(q*)) < 1, and if we use any rounded-down Newton iteration method defined 
by a;[°l := 0, and for all k > 0, and a;''^"'"^! := max(0, A/'(a;^'^-') — Ck), where Ck is 
some error vector such that < {ek)i < 2"^''+^^ for all i G {1, . . . , n}, then for 
any < e < 1, ||q* — a;[''+"'^l||oo < e, whenever the chosen parameter h satisfies 
/i> [log 11(7- B(g*))-i||oo+ log il. 

Proof. We shall use Lemma [S] to prove this. We need to find a vector v, with 
B{q*)v < V and v > 0, called a cone vector, such that we can bound the ratio 
Here Umax = maxi w^, and v^in = miui Vi. 
Since we know that p{B{q*)) < 1, we have that {I—B{q*)) is nonsingular, and 
iI~B{q*))-^ = E.=o Bi<l*y- We simply take v := ^^-^^i^^^(/-B(g*))-il 
as our cone vector. 

Then B{q*)v = v~ 1 <vandv^ n,j_gf^,.._in {1+B{q*)l+ 



\\{i-B(q'))-q^^ - '^"^ - ||{/-s(r))-Mlc 
^ Recall that by definition, since (/ — B{q*))~ 



B{q*)'^l...) > ^•'))-^| | — "^^^ latter not only shows that v > 0, but also 



(/-B(r))-Mh_ 

non- negative, ||(/ — B{q*))^^\\oo is the maximum row sum of any row of (/ — 
B(g*))-i = J^Zo^il*)'- It follows that Umax < 1, since B{q*)° = /. 

Now, a;[°l := 0, and q* < 1, so we know that q* - < 1 < ||(/ - 
B{q*))-^\\ooV < 2^ev (by definition of h). Now, for aU fc > 0, efe < 2-(''+2)l < 

4^ ||(/-s(r))-i||oo^ - 4^^- 

Applying Lemma[5l if q* — x^'^^ < Xv, then q* — a;['^+^l < q* —Af{x^'''^) + Ck < 
(I + l)ev. It follows by induction that, for aU k > 1, q* ~ xl'-'l < {2''-'' + ^)ev 



When k ^ h + 1, this gives q* — x^'^^^^ < ev. Since Wmax — \\v\\oo < 1, this means 
that \\q* - a;[''+il||oo < e as required. □ 

Theorem 7. // the PPS x P{x) with LFP solution q* has p{B{q*)) < 1 
and we use any rounded-down Newton iteration, starting at x^'^l — 0, defined by 
= max(0, + {I — B{x^''^))^^{P{x^''^) — x^''^) — ek), for any error vectors 
Ck where < {ek)i < 2^'^''+^-' for all i £ {1, . . . , n}, then for any given < e < 1, 
< where h = 14|P| + 3 + riog(l/e)] . 

Theorem [7] follows from Theorem |5] and an upper bound on ||(/ — B{q*))^^\\oQ. 
The following Lemma gives us this, from which Theorem [7] follows immediately: 

Lemma 7. // the PPS x P{x) with LFP solution q* has p{B{q*)) < 1 then 

||(/-i?(9*))-i||oo< 21^1^1+3 
Proof. We split into several cases, based on q* . 

Case 1: q* < 1. In this case we just need to use Theorem [5] (i), in which we set 
y :— q* , combined with Theorem 31 to conclude that: 

||(/-i?(g*))-i||oo< 21^1^1+1 

Case 2: q* — 1. In this case we can instead use the following result from [7]: 

Lemma 8. For a PPS x — P{x), if (I — B{1)) is non-singular then 

||(/-i?(l))-i||oo<3"n2l^l < 231^1 

Proof. The proof of this is basically identical to a proof in [71 for a closely related 
fact, which was based on more assumptions (but not all of the assumptions were 
needed). 

If we take (/ — i?(l)) to be the matrix A of Lemma HI then noting that the 
product of all the denominators in (/ — B{1)) is at most 2^-^^ this yields: 

||(/-i?(l))-i||oo<n2l^l||(/-i?(l))||^ 

Of course ||(/ — i3(l))||oo < 1 + ||-B(l)||oo < 3 (note that here we are using the 
fact that the system is in SNF normal form). Thus 

||(/-i?(l))-i|U <3"n2l^l 

Furthermore, as discussed in [7] (see section A. 6, first paragraph), for any PPS 
X = P{x) we can assume wlog that the equation for every variable requires at 
least 3 bits, and thus that |P| > 3n > nlog3 + logn. Therefore 3"n2l^l < 23l^l. 

□ 



Case 3: Neither g* < 1 nor q* = 1. To finish the proof of Lemma [71 we will 
combine the above two results for the first two cases to deal with the case when 
neither q* < 1 nor q* — 1, but that nevertheless p{B{q*)) < 1. (It is indeed 
possible for all three of these conditions to hold, when some coordinates of q* 
are 1, and others less than 1.) 

Let A (for "always") denote the set of variables Xi for which q* — 1, and let 
M (for "maybe") denote the set of variables Xi for which < q* < 1. We can 
obviously assume that both A and M are non-empty; otherwise one of the two 
above theorems gives the result. Furthermore, variables in A obviously cannot 
depend on those in M (neither directly nor indirectly). Thus we can describe 
B{q*) by the following block decomposition 



B{q*) 



B{q*)M B{q*)M,A 
B{q*)A 



We need a lemma: 



Lemma 9. For any matrix M satisfying the block decomposition given by 

M — , if both A and D are square and non-singular matrices, then M is 

also non-singular, and: 

\\M-^\\oo < max{p-i||o, + \\A-^\\oo\\B\\^\\D-^\\oo, ll^^'lloo} 

Proof. The standard formula for the blockwise inverse of a matrix gives 
AB\~^ (A-^ -A-^BD-^ 



P j-^j ^ y Q ]J^^ j' P'"'3^id6*i that A and D are non-singular. 

(The formula can easily be verified directly by multiplying by ^ ) .) 



^0 

Now recall that the l^o norm for a matrix C is ||C||oo = nrax^ \Cij\, i.e., 
it is the maximum sum across any row of the absolute value of the entries of the 
row. So 

< max{|lA-i|U + p-'lloo||i3|loop-'|loo, p-'IU} 

□ 

I-B{q*)M -Biq*)MA\ 



Now, (/ - B{q*)) = (^^ I^Biq*u)' '° "^^ Biq*))-^^ < 

max{||(/-B((;*)M)-^||oo + ||(/-S(g*)M)-^||oo||S(<z*)M.A||oc||(/-S(<7*)^)-i||oo, 
\\{I-B{q*)A)-'U}. 

Since we always wlog assume that x — P{x) is a PPS is SNF normal form, 
\\Biq*)\\oo < 2. More specifically, \\B{q*)M.A\\oo < 2. By Case 1, since < q*^ < 
1, 11(1- B{q*)M)~'^\\oo < 2"l^*fl+i, where \Pm\ denotes the encoding size of 
the system of equations xm = P{xm,^a)m, restricted to the variables in M, 
and with 1 plugged in for all variables in A. Also, by Lemma [8l since q\ = 1, 
\\{I—B{q*)A)^^\\oo < 2'^l^'*l, where — P{x)a denotes the system of equations 



restricted to variables in A (note that these do not depend on variables in M). 
Thus, 



11(7 - B{q*))-'\\^ < niax{2i4|^A^I+i + 214|^^m|+2+3|p.| ^ 231^^1} 

This can be simplified to - B{q*))-^\\oo < 2'^'^^^^+^. This completes the proof 
of Lemma [71 □ 

We now have enough to deal with the non-critical case of Theorem 

Theorem [H For any e > 0, and for an SCFG, G, if the PPS x = Pg{x) has 
LFP < g*^ < 1 o-nd p[BQ{q'^)) < 1, then if we use R-NM with parameter 
h + 2 to approximate the LFP solution of the MPS y = Pc^Din), then \\q'^'^^ — 
y'^+^'lloo < e where h 14|G| + 3+ [log(l/e) + logd] . 

Thus we can compute the probability q^^ — X^teF '^^At within additive 
error d > in time polynomial in the input size: \G\, \D\ and log(l/(5), in the 
standard Turing model of computation. 

Proof Lemma [U yields that (/ - BG®D(g'^®^))-^ € »>o, and that £((/ - 

5g®d('7'^®-°))"^) = (I - iBGiq^))-\ Leimna^vi) relates the norms: 

||(/- Sg®z5(9^®^))-'||oo < (SG(g^))-i||oo- We need a bound on the 

latter norm. Lemma [7] shows ||(/ - Bg(9'^))~^I|oo < 2"l<^l+3. So 

||(J- BG^Diq'^'^")y^\\oo < d2i4|G|+3_ Plugging this bound into Theorem [S] 

yields the result. □ 

To deal with critical SCCs, we need a way to analyse how an error in the 
LFP q* inside one SCC, S, where q^ = 1, affects those SCCs that depend on it: 

Theorem 8. Given a PPS, y — P{y) in SNF form, such that for a subvector 
X of y, whose equations are x = P{x,yu{x)), when restricting y = P{y) to the 
variables in x, and if we let yD(x) '■= for a real-valued vector < 2; < 1, and 
if the resulting PPS, x — P{x, z) has LFP q* > 0, and if q^ is the LFP solution 
of X = P{x, 1) (note that q^ > q*), then: 

(i) Ifql<l then, \\ql - q*J\oo < 2l4|^^l+2|| 1 _ 

(ii) If the PPS X — P{x, 1) is strongly connected and q^ = 1 then ||1 — g*||oo < 

(Hi) If the PPS, X — P{x, 1), is strongly connected and q^ = 1, and p{B{\, 1)) < 1 
then ||l-g*||oo < 23l^l || 1 - z||^ 

Bad examples given in 14] (see also [20]), show that there are critical PPSs 
with q\ = 1, and with ||1 — q*||oo > ^||1 — 2;||oo- Thus we cannot hope to get 
a bound linear in ||1 — z\\oo in all cases. Cases (i) and {Hi) of Theorem [SI say 
that we can get a linear bound except for critical PPSs, where we indeed need a 
square root in the strongly connected case (case {ii))- 



Proof (of Theorem\B^. We first prove the following: 

Lemma 10. ForQ < z < z' <\, and for allO <x <l, \\P{x, z') - P{x, z)\\oo < 
2\\z-z'\\^ 

Proof. Consider the fc'th coordinate, P{x,y)k, of the PPS polynomials P{x,y), 
in SNF form. We distinguish cases based on the type of Xk- If Xk has type Q: 
then P{x, z)k and P{x, z')k both have the form XiXj, or both have form z| ^Xj, 
or both the form XiZj , or both the form z^ Zj . Thus, since < z < < 1, 
and < a; < 1, we have < P{x, z')k — P{x, z)k < z'^z'j — ZiZj < 2\\z ~ z'Hoo- 
In the case where Xk has type L, we have < P{x,z')k — P{x,z)k < 
Pk,j {z'j — Zj) < II z — z'll oo, because the coefficients pk.j of the type L equation 
must sum to < 1. 

Finally, if Xk has type T, P{x, z)k and P{x^ z')k are equal constants, so their 
difference is 0. □ 

Lemma 11. If x — P{x, z) is a PPS with LFP g* > and x — P{x, z') has 
LFP q*, > for some < z < z' < 1, and {I— B{^{q*, +ql), z' j) is non-singular 
then 

hi, - g:i|oo < 2||(/ - Bi^iql, + q:),z'))-'\U\z' z|U 

Proof. From Lemma 4.3 of [6J, applied to the PPS x = P{x,z'), (where we let 
y :— ql), we have: 

{ql, - ql) = (/ - B{\{q:, + q*,), z')r\P{q:, z') - ql) 
We can take norms: 

Ik:' - 9:iloc = ||(/ - + ql),z'))-^\U\{P{ql,z') ql)\\^ 

Now we just apply Lemma [TUl to obtain that ||(P(q*, z') — 'i'2)||oc < 2||z' — z||oo- 

□ 

To get parts (j) and (m) of Theorem^ we apply Theorem[5j For establishing 
(i) of Theorem [SI we need to apply (i) of Theorem [S] to the PPS, x = P{x^ 1), 
with y :— q*. This gives 

|1(/ - B{\{q: + ql), l))-l||oo < 2101^1 max{2(l - ql)^^, 21^1} 

Now, since in part (i) of Theorem |S1 we are given that < 1, we know that 
< < 1 - 2-''l^ll, by Theorem 3.12 of ^. So we have 

||(/-i?(l(g:+gD,l))-'lloc<2"l^l+i 

Lemma [TT] now tells us that: 

lk^-9:i|oo<2"l^l+2||l-z|U 



This finishes the proof of part (i) of Theorem [51 

To prove part (ii) of Theorem [8l first remember that we assmne x — P{x, 1) 
is strongly connected. We use part (ii) of Theorem [S] 

By assumption, ql = 1. We take z = ^{1 + q*), giving: 

11(7 - + ql), < 2^l^l \ (3) 

Now 

1)(1 - ql) < 5(1(1 + ql), 1)(1 - ql) 

= P(1,1)-P(g*,l) (by Lemma 3.3 of [75) 
<P{1,1)-P{q:,z) = l~q: 

Now we apply Lemma [6l letting u be 1 — g* in the statement of that Lemma, 
and considering B{^1,1) in place of the B{^1) in the statement of the Lemma. 

This tells us that (llg^f^^ < 2l^l. 

Now, if we substitute this into the equation ([3]), we get 



IIU-S(i(l + g:),l))-i||oo< 2^1^1+1: 



^2^ " 
Lemma [TT] now gives: 

111 - <Z:||oo < 2||(/- i?(i(l + ql), l))-l||oo||l - ^lloo 

Inserting our bound for the norm of (/ — B{^{1 + q*), 1))^^ gives: 

111 -g:l|oo< 251^1+2-— i^lll-zlloo 

II-"- yzlioo 

re-arranging and taking the square root gives: 



iii-<?:iioo< V2^"''+'iii-^iioo 

As long as the encoding size is |P| > 2, which we can clearly assume, we have: 



111 -g:iloo< 2^1^1^111 -;^||oo 

For part (iii), the significance of the condition that p{B{l, 1)) < 1 is that it 
implies (/ - B{1, l))~i exists, and (/ - 5(1, 1))-^ > {I - 5(i(l + ql), 1). So, 
we use a bound on ||(/ — -6(1, l))^"'^||oo: 

Lemma [11] gives: 

111 - g:iioo < 2||(/ - p(i(i + q:), i))-i|iooiii - ^iioo 



Now + < ||(/-B(l,l))-i||oo. We can apply Lemina[8] 

on the PPS x = P{x, 1), which yields ||(/- 5(1, l))-i||oo < 23l^l. Now we have 

iii-<z:iioo<23i^i|ii-zii^ 

as required. □ 

Theorem 9. Suppose x = P{x) is a PPS in SNF form that has critical depth 
at most c. Let S €R, such that < S < 2~'^l^l~^. Suppose that in every bottom- 
critical sec of x = P{x) we reduce a single positive coefficient, p, by setting it to 
p' =p{l-5), resulting in the PPS x ^ Ps{x). Then |k*-g|||oo < 2i4|P|+2j(i/2') 
where q* and are the LFP solutions of x = P{x) and x = Ps{x), respectively. 
Furthermore, - Bs{q*s))-'^ \\oo < 2*^1^1+2,5"^ 

Proof. If c 0; we have no critical SCCs, so we don't change any coefficients, 
and q* = qg, and the remaining claim about ||(/ — Bs{qg))^^\\oc follows directly 
from Lemma [71 

So, we can assume c > in the rest of the proof. To establish that q* and qg 
are close, we will use Theorem |S1 For any SCC, S, of a PPS x — P{x), either 
qg = 1 or qg < 1, because every variable in S depends (directly or indirectly) 
on every other, so if any of them are < 1, then so are all the others. 

Let S be an SCC with 95 = 1 and with {qg)s < 1. The SCC S necessarily 
only depends on SCCs, T, with q^ ^ I, because otherwise we wouldn't have 
qg = 1. We want to show that 

||l-(9|)5||oo<<5(^/^''^°"')-26|^--<-)l 

where Csud(S) is the critical depth in a;suD(S) = Psud{S){xsud{S)): and \Psud(S)\ 
denotes the encoding size of the latter PPS. To prove this by induction, we can 
assume 

||i-(g|)D(5)lloo<^(^/^'°''')-2e|^-(-)l (4) 

The base case is when S" is a bottom-critical SCC, that does not depend on any 
other critical SCCs. Then even if D{S) is non-empty, g^^gj = iQg)D{S)- However, 
we do change a single coefficient p in S, by setting it to p' — p{l — S). Note that 
because the PPS is in SNF form, p must appear in a equation Xi — P{xs, l)j 
where Xi is of type L, and thus the coefficient p appears in a single term pxj. We 
wish to consider a new PPS in SNF form, parametrized by the possible values 
z € {(1 — S), 1} that we multiply p by. To do this, we can simply add a new 
variable Xn+i (for this particular SCC, S), and we then replace the term pxj by 
pXn+i, and we add a new equation Xn+i = zxj to our system of equations. We 
denote this new PPS by (2:5, x„+i) = Qsii^Sj a^n+i), z). Note that this is indeed 
a SNF form PPS for either z G {(1 — S), 1}. Note also that in terms of encoding 
size, we have \Qs\ < 2|P5|. 

The LFP solution of {xs,Xn+i) — Qs{{xs,Xn+i),i), in the S coordinates 
has qg = 1, and the LFP solution of {xs,Xn+i) — Qs{{xs, Xn+i),{i — 5)) in 
the S coordinates is {qg)s- Thus, by Theorem |8] (ii), we get ||1 — (g|)s||oo < 



23l'3s|y/j < In this case Csud{S) = 1 so this is enough to estabhsh the 

inductive claim in inequality 

Next, suppose that S' is a critical SCC that depends on a different critical 
sec. Qg is the LFP solution of 2:5 = Ps{xs, ^^((s)) and {qs)s is the LFP solution 

of xs = Psixs, iq*s)D(S))- By Theorem|S](n), ||l-(9|)s||oo < - 
Substituting using the inductive assumption in inequality ^ gives: 

111 - (gDslloo < 23l^-ly^||l-(g|)^(5)||^ 

< 23|^s|^^(1/2'o(S))26|Pd(s)I 
^ 231^^31 + 11/^1,(5) |^(1/2'°(S)+') 

< ^(l/2'SuB(S))23|Psuc(S)l 

The last inequality holds because Csud{S) — '^d{s) + 1- This is because S 
is itself a critical SCC. Note also that \Psud{s)\ = \Ps\ + I^d(s)I since xs = 
P(xsjXd{s))s and xd^s) = P{xd{S))d{s) are disjoint subsets of the equations 
in a; = P{x). 

Finally suppose that S is not a critical SCC but does have q*g = \ and depends 
on some critical SCC. Again is the LFP solution of xs = Ps(xs,g^^^p and 
{ql)s is the LFP solution of xs = Ps{xs,{(15)d(S))- By Theorem |8] (mi): ||1 - 
(^Dslloo < 2'^l-^^l||l — (g|)D(S))||oo- Substituting the inductive assumption (jH) 
gives \\l-{q*s)s\\oo < 23l^s|+6|Po(s,l^(i/2'"(s)) ^hich simplifies to ||l-(g|)s||oo < 
^(i/2<suD(s))26|Psuc(a)l. This is because S itself is non-critical, so Cd{s) = Csud{s)- 

Let A (for "always") denote the set of variables Xi for which q* = 1, and 
let M (for "maybe") denote the set of variables Xi for which < q* < 1. A is 
non-empty as otherwise we would have no critical SCCs. Every variable Xi in A 
is part of some SCC S with = 1. So our induction has already given that 

I|i-('zI)a||oo<5^/''26|^-i 

If M is empty, this bound on \\q* — qgWoo is enough. Otherwise we have to 
use Theorem [8] (i). This gives that Ijq^, - (gJ)M||oo < 2^^^^""^+^! - (g|)^||oo. 
Substituting gives Wq^j — ((7|)a/||oc < 2^''l-^l+^5^/^ . We have now shown that 

lk*-g|||oo<2i4|^l+V/2' 

The only thing left to complete the proof of Theorem [5] is to get a bound on 
— Bs{qg))~^\\oo- For this we will use the techniques of the proof of Theorem 
[T] Call the set of variables for which {qg)i = 1, As and the set of variables 
Xi for which < (g|)i < 1, M^. Since q} < q*, M C Ms and As C A. It is 
worth noting that variables belonging to critical SCCs are in A n Ms. We will 
first show that if a variable Xi depends (directly or indirectly) on some variable 
Xj for which we have reduced a coeSicient in Ps{x)j, then {q*s)i < 1 — 2~^^^5. 
For any such x^, consider a shortest sequence xi^,xir^, . . . ,xi^, such that (1): 



h ~ j and Ps(x)j has a reduced coefficient in it, (2): Im = i, and (3): for every 

< fc < m, Ps{x)i^j^-^ contains a term witli Xij.. Tliere is some term Pj.hXh in 
P{x)j which has been changed to Pj.h{l ~ S)xfi in Ps{x)j. Since x — P{x) is a 
PPS, -P(l)j < 1, but note that Ps{x)j is not proper, as indeed we must have that 
^•5(1)^ < P{l)j-Pj,h5 < 1-PjmS. Also note that {q^j = P6iq*s)j < Ps{l)j < 

1 - PjmS- For any < fc < TO, if xi^^^ has type Q, then {q})i^^i < {qs)h- If 
^ik+i has type L, then 1 - (qDi^^+i > Pi^+i,ik{l - By an easy induction 

1- > iU{k\x,^ has Type L}Ph + l,h)i'^ " (^i)^)- ThuS: 

1-('7|)>( n Pik+i,ik)Pj,h^ 

{k\xi^ has Type L} 

Since this is the shortest sequence satisfying the stated conditions, for any < 
fc < m, Ps{x)i^ has not had any coefficients reduced, and furthermore the a;;^'s 
are aU distinct variables. So all these coefficients pi^^^^^i^ and pj^t are distinct 
coefficients in x = P{x). The encoding size \P\ is at least the number of bits 
describing these rationals pi^_^-^^i^ and pj^h and thus 

(q|),< 1-2-1^15 

Next we show that the PPS x = Ps{x) is non-critical. Suppose, for a contra- 
diction that X = Ps{x) is critical. Then it has some critical SCC S. But then 
5* must have also been an SCC in the PPS x = P{x), because the dependency 
graphs of these PPSs are the same (we never reduce a positive probability to 
0). For S* to be a critical SCC in a; = P5{x) , we must have that {qg)s — 1 and 
piBs{l)s) = 1. However, q* > q*^ and piB{l)s) > piBsil)s) = 1- So = 1. 
Lemma 6.5 of [8] shows that for any strongly connected PPS, x = P{x), with 
Jacobian B{x), and with LFP, q*, if a: < q* , then p{B{x)) < 1. Thus, by conti- 
nuity of eigenvalues, p{B{q*)) < 1. Applying this to the strongly connected PPS 
xs = Pixs, l)s, since q*^ = 1, we get p(-B(l)s) < 1. Thus p{B{l)s) = 1 i.e. S 
is a critical SCC of x = P{x). Either S* is a bottom-critical-SCC or it depends 
on some bottom-critical-SCC. So every variable Xi in S depends on some vari- 
able Xj for which we have reduced a coefficient in Pg (x)j . So for every Xi in S, 
q* < I — 2~l^l^. But this contradicts our earlier assertion that q^ = 1. 

Bs{q*s) has the block decomposition Bs{q*s) = ( ^'^('^l)^/, BsiqpMs.As 

\ U os[qg)As 

It is possible that As is empty, in which case the bound we will obtain on 
II (J — Bs{ql)Ms)~^\\oD will be enough to show the theorem. So we suppose here 
that As is non-empty. Ms is non-empty since we assumed that we have at least 
one critical SCC. 

We need to show that both / — Bs{ql)Ms and / — Bs{qg)As are nonsin- 
gular, and we need to get upper bounds on ||(/ — Bs{q})Ms)~^\\oo and ||(/ — 
Bs{qs)As)~^\\oo- Once we do so, we can then apply LemmalHlto get a bound on 

||(/-B(?|))-^||oo. 

First, let us show that / — Bs{qs)As is non-singular, and also bound ||(/ — 

Bs{q*s)As)-'\\oo. 



We note that P{x)as = Ps{x)as- We have shown that any variable Xi for 
which we have reduced a coefficient in Ps{x)i has q* < I — 2^1^1(5 and so Xi is 
not in As- Thus the equations in xas — Ps{xas)as) ^^'^ a subset of the equations 
X = P{x) and so the encoding size of this PPS is at most \P\. We have also 
shown that the PPS x = Ps (x) is non-critical. So we can apply Lemma [S] to the 
PPS XA, = P5{xa,)a,). which gives || (/ - < 2^\p\. 



Now, let us show that / — Bs{qg)M5 is non-singular, and also bound ||(/ 
Bsiq*s)Ms)-'\\oc. 



Consider the PPS, restricted to the variables in Alg. Note that no variable in 
As can depend on these. Thus, restricting the PPS x = Ps{x) to the variables in 
Ms defines a PPS xms = Ps{xmsi '^)ms- Note that the LFP of this is (g|)Ma < 1, 
by definition of Ms. To simplify notation in the current argument, we shall 
denote this PPS by y = R{y), and we shall use r* := (qDa/s to denote its LFP. 
Furthermore, let us use Bii{y) to denote its Jacobian. We note, firstly, that 
Bnir*) — Bs{ql)Ms - The way to see this is to note that g| — (r*, 1) and so the 
entries of both matrices are ^^^-^{qs) for Xi,Xj € Ms- 



So, rephrased, we want to show p{Bfi[r*)) < 1, and we want to find a bound 
on (/ — Bfj{r*))~^ . To do this, we need to follow the proof of Theorem[5] («) in 
the case y = r* . (That Theorem was proved in [B].) 



We need to use Lemma [31 with A = Bji{r*) and u = 1 — r* . By Lemma 3.5 
of 13, Bp{r*){l — r*) < 1 — r*. We want to find any l3 so that condition (I) of 
Lemma [3] applies to variables yi such that either yi has type Q or else R{l)i < 1. 
Namely for such variables yi, it should be the case that (_Bfl(r*)(l — r*))i < 
(l-/3)(l-r*),. 



Let us first note that, for any yi, r* < 1 — 2^^^6. We have shown that if a 
variable Xi depends on some variable Xj for which we have reduced a coefficient 
in Ps{x)j, then (gDi < 1 — 2~l^l(5. If Xi e Ms depends on no such variables, then 
Xi € M. But then we have < 1 — 2~'*l^l < 1 — 2~l^lj because we assumed 
that 5 < 2-3l^l. So for any G Ms, {q^)^ < 1 - 2-1-^1(5. 



In the case where yi — R{y)i has form Q, for some yj,yk, R{y)i — VjUk and 

so 



Br{r*){l-r*))^ = r;{l-rl)+rUl-r*) 
= r* + r* - 2r*r* 
= {l-r*rl)-{l+r*rl-r*-rl) 
= (l-<)-(l-r*)(l-r*) 

- (1 - r*) - - r^)(l - r*) + (1 - r*)(l - r^)) 

<(l-<)-^2-l^l<5((l-r*) + (l-r:.)) 

< (1 - <) - i2-l^l5((l - r*) + (1 - r^) - (1 - r*)(l - r^)) 

= (l-i2-l^l5)(l-r*) 

Some variables Xi with P5(l)i < 1 have P{l)i < 1, in which case P{l)i < 
1 — 2l^l. If a variable Xi has < 1 but P{l)i = 1 then we have reduced 

some coefficient in Ps{x)i by multiplying it by 1 — (5 so we have ^^(l)^ < 2~^^^5. 
So for any y, with < 1, R{1), < 2-1-^1(5. So if R{1), < 1, 

(i?fi(r*)(l - r*)), < (5^(1(1 + r*))(l - r*)h 

<(i?(l)),-(i?(r*)), 

<(l-2-l^l5)-(r*), 
<(l-2-l^<5)(l-9|). 

So condition (I) of Lemma |31 with /? = 2-(l^l+^)(5, applies to variables yi 
which either have type Q or have < 1. 

It remains to find an a such that condition (II) of Lemma |3] that applies to 
yi which either has type L and satisfies R{l)i — 1. (Note that there aren't any 
variables of type T in M5, and thus none in y.) We need the following Lemma 
from [B]: 

Lemma 12. (Lemma C.8 oj 16]) For any PPS, x=P(x), with LFP < < 1, 
for any variable Xi either 

(I) the equation Xi = P{x)i is of type Q, or else P{l)i < 1. 

(II) Xi depends on a variable Xj, such that xj = P{x)i is of type Q, or else 

So given yi of type L and with Ri{l) = 1, there is a sequence yi^,yi^^ . . . ^ yi^ 
with Im = i, with yi^ of type Q or R{\)i^ < 1 and for every < fc < to, R{y)i^j^-^ 
contains a term with yi^,. Without loss of generality, we consider the shortest 
such sequence. Then for < fc < m, yi^ does not have type C) so it must have 



type L. Also R{l)i^ = 1. So R{y)i^ contains a teim pi^ji^ -^yk-i- We have that, 
BR{r*)i^^i^_-^ = pi^,i^_-i^. Because = 1, this term has not been reduced in 

Ps, so pi^,i^^^ is a coefhcient in x — P{x). That this is the shortest sequence 
imphes that each of these is a distinct coefhcient in a; = P{x). So Ofc^/ Pik+i h ^ 

2-1^1. Now > Uk^,'Bnini,,,,,^Uk^,[pi.,.U> 2-l^'l 

So condition (II) of Lemma [3] apphes to iji of type L with Ri{l) ~ 1 when 
a = 2-l^i. 

We can now use Lemma |3] with A — Bii{r*), u — 1 — r* , a — 2^^-^^ and 
/3 — 2^1-^1(5, giving 

\\iI~Bn{r*))-'\\^< 



We have argued that (1 — r*)min > 2 ^-^^5. Using n < 2'^' as a (very) 
conservative bound on n, we have: 

\\iI~Bsiq})Ms)-'\\oo<2^^^\S-' (5) 

If As is empty, then Bs{ql) — Bs{qg)Ms and so we are done. 

Otherwise we appeal to Lemma[5]with the block decomposition I — Bs{qg) = 

// - Bs{q*s)Ms -Bs{q^)mA.y Letting Z = {I - B,(g|)M.), applying Lemma 

[5] we get: 

||(/-i?5(g|))-i||oo <max{||Z-i||oo + ||Z-i||oo||S5(gJ)Af.,A.||oo||(/-SA-(g|)Aj-'IU, 

\\{I - Bs{q*s)As)-'\\o.} 

and - Bsiq*s)As)~^\\oo < 23l^l and ||B5(q|)M,,A, lU < 2. Combining with 
the bound above in ([5]), we get: 

||(/-i?5(g|))-i||oo <max{25l^lr3 + 251^1^-3231^12, 231^1} 
Or, more simply, ||(/ - B5(g|))"^||oo < 2^\P^+^S~^. □ 

We are finally ready to prove Theorem [31 to which this entire section was dedi- 
cated. 

Theorem \3[ For any e > 0, and for any SCFG, G, in SNF form, with q^ > 0, 
with critical depth c{G), consider the new SCFG, G' , obtained from G by the 
following process: for each bottom- critical SCO, S, of x — Pq{x), find any rule 
r = A ^ B of G, such that A and B are both in S ( since G is in SNF, such a 
rule must exist in every critical SCO). Reduce the probability p, by setting it to 
p' = p(l - 2-(i4|G|+3)2'<«)g2'(«')^ 2?o this for all bottom- critical SCCs. This 
defines G' , which is non-critical. 

Using G' instead of G, if we apply R-NM, with parameter h-\-2 to approximate 
the LFP solution q^'®° of the MPS y = Pc^oiy), then \\q^'^° ~ x'^^+^^\\oo < e 
where h := [logd + (3 • 2<^^ + l)(log(l/e) + 14|G| + 3)] . 



Thus we can compute the probability g^' = '^t£F ifo^t '^Hhin additive er- 
ror S > in time polynomial in: \G\, \D\, log(l/(5), and 2'^^'^\ in the standard 
Turing model of computation. 



Proof (of Theorem\3^. 

Note that for an SCFG, G, and its corresponding PPS, x — Pg{x), the 
bit encoding size of G is at least as big as that of the PPS. In other words, 
we have \G\ > \Pg\- So, we can apply Theorem IH] to the PPS x — Pg{x) 
with S := 2-(i4|G|+3)2'<«)g2'<'5)^ yielding that _ gG'|j^ < | _ 
Bc'iq^'yr^Woo < 28|G|+2+3(i4|G|+3)2'«g-3.2'(«)^ ^ow Lemma ffl and Lemma [2] 
(vi) allow us to convert this bound on ||(/ — Bciq^ ))~^||oo to a bound on 
||(/-BG,^,5(g^'®^))-^||oo. Namely: 

ll(/-i?G'«D(<Z^''^^))-^l|oo<d2«l«l+2+3(14|G|+3)2''-)^-3.2'(-) 

Now Theorem [5] gives that Hqqi^d ^ ||oo < f since 

h > log |!(/ - BG'<^Diqh'^D))-^\\oo + log(l/f ). Thus 



< h'^ - q^'Woo + \\q^''^° - x[''+^l||oo (by Lemma[I]& LemmalUm)) 
e e 



2 2 
e 



□ 



C Proof of Proposition |4] 

Recall that, for a string a E {VU S)*, with n= \ V\, K{a) is the n- vector where, 
for A £ V, KA{a) is the number of times A appears in a. Recall that we define 
C(r, tt) to be the number of times the rule r is used in the derivation tt, and we 
define C{A,n) — X^rei? C'(r, tt). For A E V, define e"^ to be the unit n-vector 
with (e^)^ = 1 and (e^^ = ioi B A. Define K{tt) = J^a C(A,7r)e^. 

Recall that when doing parameter estimation (and EM) we use formula ^ 

to obtain (or update) the probabilities of rules in G. 

Recall that ^(Tr) is a probability distribution on the complete derivations of 
the grammar that start at a designated start nonterminal, S. Again, equation 
^ only makes sense when the sums X^tt ^('''')^(^' finite and nonzero, 
which we assume; we also assume every non-terminal and rule of H appears in 
some complete derivation tt with V{Tr) > 0. 



Proposition [4l // we use parameter estimation to obtain SCFG G using equa- 
tion under the stated assumptions, then G is consistent, i.e. = 1, and 
furthermore the PPS x ~ Pg{x) is non-critical, i.e., p{Bg{1)) < 1. 

A first step toward establishing Proposition 2] is the following Lemma, from 
which we derive a (left) cone vector for Bg{1), which ultimately allows us to 
show p(5g(1)) < 1. 

Lemma 13. Let S denote the designated start nonterminal. Then 

= {I-BG{lf)iJ2v{n)Kin)) 

Proof. Firstly, we need to relate Bg{1) to the probabilities of the rules. Given a 
rule A — 7 we define BA^-y{x) := Bga^^{x) where Ga^^ is an SCFG with the 

same non-terminals and terminals as G but with only one rule, A ^ 7, which 
has probability 1. So then _B^^-y(l) is zero outside the A row. We allow that G 
may or may not be in normal form. We can say that 

In terms of the "partial" SCFGs, G^, associated with each rule r G R, this says 
Pg{x)a = Y.reRAP(^)PGAx)A- The A TOW oi Bg{x) is thenY,reRAPi'^)^r{x)A- 
Since BA^-y{xG) is zero outside of the A row, Bg{x) — X^^i SrsiiA ^'('')-^^('^)' 
That is: 

BG(x) = Ep(r)B.(a:) (6) 

reR 

So we can obtain i?G(l) from each of the i?r(l)- Ba^i{^) is zero except in the 
A row. For any non-terminal B, 

Ba^^{x)a.b = affile 4^''^" = ^b{i)xI^^'''^-'Y{c^s4''^"- Evaluated at 1, 
this yields: 

{Ba^^{1))a.b^i^b{i) (7) 

Now we look at what happens to the count of non-terminals in the derivation 
TT. We have 5 ^ w for some w G 17*. That is, tt = rir2...rk € R* , and 
ao =l> ai =1- a2 =^ ■ ■ • =^ ctm, for ao = S, am = w and some ai, a2, ■ ■ • , ct-m-i S 

{v\jsy. 

Consider Ui =!• cti+i for some < i < m — 1. The rule is Ai — 7^ for some 
non-terminal Ai and some string 7.^. Replacing Ai by 7^ affects the counts of the 
non-terminals by n{ai+i) — K{a.i) = n{'^i) — e'^'. Note that for any nonterminal 
A, and rule A — >■ 7, we have i?yii^-y(l)-^e"^ = '^(7); by equation ([7]), so 

(/-i?^^^(l)^)e^-e-4-«(7) (8) 
Since for any string w G S* , we have k{w) — 0, we get: 



m — 1 
1=0 

= E E (C^(^^7,vr))(e^-^(7)) 

A (A^'i)eRA 

= E E (^^(^^7,T))(/-i?^^,(ine^ (by®) 

A {A^y)eRA 

This is true for any complete derivation it, so we can use the probabihty distri- 
bution V{tt), which has '}2,^'P{n) = 1 to obtain: 

= E ^('^) E E (^(^ ^ ^' ^))(^ - SA^7(l)'^)e-' 

T A (A^i)eR.A 

= E( E E^W(^(^^7,T))(/-i3A^^(lf))e^ 

A IT 

= (/-i?G(in(E^w^w) 

TT 

□ 

Proof (Proof of Theorem^. Define v = {J2Tr'^i'^)-^i'^))- Then we have that 
V — Bg{'^)'^v + e'^. We want to use Lemma[3]to show that < 1. We 

can do this by applying it to the vector u = p^jj — v. We do not need explicit 
bounds on a, /3 and Umin, but we need to show that the conditions hold for some 
positive a, /3 and M,nin- Firstly, we note that v > 0, since every non-terminal in 
G appears in some derivation n with V{Tr) > 0. So u > 0. Since u = — v, 

\\u\\oo = 1. Note that u = j^^iBailfv + e^) Bailfu + jf^^^e^. Thus 
Bq{1)^u — u— -pli — e'^ < u. In the S coordinate (and only in the S coordinate), 
we have that {Bc{'^yu)s ^ us — — < us, so there is some /3 > for which 
{Bg{'^)'^ u)s < (1 — P)us- For this /3, us satisfies condition (I) of Lemma [31 We 
need to find an a for which all non-terminals other than S satisfy condition (II) 
of Lemma [31 

Consider a non-terminal A ^ S . A appears in some complete derivation tt 
with 'P{'k) > 0. There is some sequence of (not necessarily consecutive) rules 
ri : Di ^ ^i, i = 1, . . . ,k, appearing in that order in tt, such that Di = S, 
Di e 7i_i for all 2 < I < A;, and A e 7^. Without loss of generality k < n, since 
otherwise there must be i,j with 2 < i < j < k such that Di = Dj and so the 
shorter sequence ri, ...,ri^i,rj, ...r^ would have satisfied the above conditions. 
For any 1 < i < k-1, (Br, (1))d.,_d,+i = k(7j)d,^i > 1, and similarly (5^^ (l))Dfc, A 
1. Now any rj, with 1 < j < fc, appears in tt which has ViTr) > 0. So p{rj) > 0. 



But Bail) > p{rj)Br^{l). So for any 1 < ^ < fc - 1, (Bg(1))d.,I3,+i > p{n) > 
and similarly Bg{1)d,.a > 0. So {Bg{1)'')s.a > 0. Then ((Bg(1)^)'=)a,s = 
((Bg(1)'=)^)a,s = (Bg(1)'=)s,a > 0. We then define = ((Bg(1)^)'=)a,s. If we 
take a = min^j^^y^j^^sy aA-, then a > and all non-terminals S satisfy con- 
dition (II) of Lemma[2] i.e., for each S, there is a fc with {(Bq(\)^^^) a,s ^ 
a. We can now apply Lemma [3] which yields that p(_Bg(1)"^) < 1. So p(_Bg(1)) = 
p(Bq(V)^^ < 1. So, G is not critical. Consistency of G, i.e., the fact that q'^ ~ 1, 
also follows. This holds because, firstly, we can easily see that G is a proper SCFG. 
In other words, for any nonterminal A, the sum of the rule probabilities is 1, 

because EreR^Pir) = EreRA ilvt)cU% = ^• 

Thus, G has a PPS, x = Pg{x), such that Pg(1) = 1, and p{Bg{1)) < 1. 
Lemma 6.3 of [8] tells us that for any vectors < x < y, BG{y){y — x) > 
Pciy) - Pg{x). Let y = 1, and let x = . Then we have Bg(1)(1 - q'^) > 
Pg(1) — Poiq'^) = 1 — 9'^, since we have argued both 1 and q'~^ are fixed points 
of Pg- But ^0(1) is a non-negative square matrix, and (1 — q'~^) > 0. Theorem 
8.3.2 of [TU] tells us that for a square matrix M > 0, and vector w > 0, if w 7^ 
and Mv > v, then p{M) > 1. We know that Bg(1)(1 ~ q^) > ^ - , but 
we have already established that p{Bq{\)) < 1. Thus it must be the case that 
(1 — g*^) = 0. In other words, G is consistent. □ 



D A bad example for infix probabilities 

We now present a family of SCFGs, G„, of size 0{n), and with critical-depth n, 
and we give a fixed 3-state DFA, D. We use these to indicate why it is likely to be 
difficult to overcome the exponential dependence on critical-depth of the given 
SCFG, G, in order to obtain a P-time algorithms for computing the probability 
(within desired precision) that an arbitrary G generates a string in L{D). 

The DFA D, is depicted in Figure 1. It has only 3 states and the property it 
checks is whether aa is an "infix" of the string. In other words, L{D) = {waaw' \ 
w e E* and w' G E*}. The family of SCFGs G„ is defined by the following 
rules: 



start 




Fig. 1. Automaton for the infix aa 



^0 


0.5^ 






0.5^ 


Ai 


Ai 


0.5^ 


AiAi 


Ai 


0.5^ 


A2 



An — )• CaBnttC 

Bn — > Bn-iBn-l 
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Bo^e 
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Proposition 7. q*-^" = 1. In other words, the probability of termination (gen- 
erating a finite string) starting at any nonterminal in Gn is 1. 
Furthermore, g^^^^^^^ = \ the probability that this SCFG Gn, starting at Aq, 

generates a string which has infix aa. On the other hand, q^f"f^^^ = 2~^' is the 
same probability, starting at Ai. 

The proof of this proposition is not at all difficult (using simple induction, and 

the formula for solving quadratic equations). 

Let us argue why this causes severe difficulties for the approximate computa- 
tion of 9"^®-°. Note that qft'^ffts) = \ ^^HZ^) = However, in the prod- 
uct MPS y = PG®D{y) the variable y^tiAots) depends on the variable y(tiA„t3)j 
and furthermore, if we, for example, "under-approximate" q^"^^^^ = , and 
instead set y(tiA„t3) '■— 0, or, what effectively achieves the same result, if we 
change the product MPS by setting PG^D{y)tiA„t3 = 0, then in the resulting 
modified MPS, with new LFP ,jG„®d^ ^^^^d get qf^^f^^-, = 0. 

Likewise, one can show that if we "over-approximate" <7|^"J'^,^-) , even very 
slightly, setting PG(SDiy)tiAnt3 = ^ consistent way, then we will end up 

with a new LFP q^n^D ^ snch. that 'Z(J"J'j,^3) ~ 1 (in other words, very close to 
!)• 

In both cases, the resulting approximate solution ^(("^^fg) is terribly far from 
the actual solution i . (Note that this is irrespective of the algorithm that is used 
to compute the other probabilities.) 

Furthermore, we can not in any way use the fact that we can detect in P-time 
and remove variables xa from the PPS x — Pq^(x) for which q^" = 1, because 
indeed g*^ = 1, and yet in the product q'^'^^ there are coordinates with wildly 
different probabilities that we wish to compute. 



